Designing a Connected Vehicle Platform on Cloud IoT Core

This solution examines managing connected vehicles with usage-based insurance using Cloud IoT Core on Google Cloud Platform (GCP).

Vehicles are transforming from individual, self-contained, transportation-focused objects to sophisticated, Internet-connected endpoints, often capable of two-way communication. The new data streams generated by modern connected vehicles drive innovative business models such as usage-based insurance, enable new in-vehicle experiences, and build the foundation for advances in vehicle technology such as autonomous driving and vehicle-to-vehicle (V2V) communication.

GCP provides a robust computing platform that takes advantage of Google's end-to-end security model for building and operating connected vehicle platforms.

Data types

Connected vehicle data is composed of a broad set of sensor and usage data, such as:

  • Vehicle location. GPS coordinates, speed limit, accelerometer, compass orientation.
  • Drivetrain metrics. Drive status, engine RPM, engine temperature, fuel level, fault codes.
  • Vehicle environment status. Cabin/external temperature, rain detection, humidity.
  • Custom sensors. Cameras, third-party tracking services including payload temperatures, location, speed, damaging impacts.

Use cases

A connected vehicle combined with the data storage and analytics capabilities of the cloud enables many new automotive use cases, including:

Usage-based insurance. Insurance policies historically have been based mostly on how much you drive, but with advanced telemetry and sensor data, it is possible to incorporate actual driving behaviors into insurance risk models. These behaviors include acceleration/deceleration, speed compared to speed limits, and types of driving, such as commuting on freeway compared to communting on surface streets. Insurers base policies on observed driving behaviour, which means that safe drivers can be rewarded with lower insurance premiums.

Predictive maintenance. By actively monitoring telemetry data, you can combine the telemetry data with other data sources, such as expected operational tolerances or previous operational patterns, to better identify possible failures. Suppliers can identify issues with parts and operational components. Drivers have a whole new automated service experience: the vehicle can alert the driver of recommended maintenance and then recommend a location, date, and time for an appointment.

Freight tracking. Tracking shipments is a challenge that is important both for consumers and for enterprises. Consumers who know exactly when their order will arrive have a more positive customer experience. Enterprises have myriad use cases. One example is tracking goods through the supply chain, a challenge in food production. With sensors and the connected platform, you can capture processing facilities information, date of processing, and verify that the goods stayed within a given temperature tolerance.

Customized in-vehicle experience. Many consumer vehicles already offer Internet connectivity, entertainment options, and interaction with the vehicle using a smartphone. Through analysis of the streams of vehicle data, you can further tailor the in-vehicle experience to specific driving behaviors, common travel destinations, and frequent searches. For example, the vehicle could automatically pre-heat or cool the cabin prior to a daily commute based on common daily driving patterns.


Vehicles can produce upwards of 560 GB data per vehicle, per day. This deluge of data represents opportunities to derive value from the continuous stream of data and challenges in processing and analyzing data at this scale.

The main challenges in developing a platform to connect and manage vehicle data include:

  • Device management. To connect devices to any platform, you must be able to authenticate, authorize, push updates, configure, and monitor software. These services must scale to millions of devices and provide persistent availability.
  • Data ingestion. The platform must reliably receive, process, and store messages.
  • Data analytics. You can perform complex analysis of time-series data generated from devices to gain insights into events, tolerances, trends, and possible failures.
  • Applications. Developers must create business-level application logic. This logic must integrate with existing data sources, including third-party sources and on-premises data centers.
  • Predictive models. You must have predictive models based on current and historical data in order to predict business-level outcomes.

To derive value from vehicle data, you must be able to ingest, store and process device data at scale. You musst process data securely throughout the platform, and you must be able to scale processing, storage, and analytical applications to handle the amount of data generated from millions of devices in various geographies. You need rapid data analysis and advance predictive capabilites and feedback loops for applications from machine learning.

GCP provides a robust platform for data ingestion, Internet of Things (IoT) device management, storage, analysis, and machine learning predictions. Centralized device management of a gateway per vehicle simplifies the control plane and data plane for sensors and data sources while helping to provide security and operational boundaries with cloud-based systems.

Usage-based insurance requirements and design

This section discusses the requirements and architecture specifically designed for the usage-based insurance use case.

Architecture diagram

connected car architecture

Device management

In the usage-based insurance use case, vehicles are equipped with Internet-connected telemetry that reports a series of low-level vehicle events recorded during each vehicle's trip. Data is uploaded in real-time or at the conclusion of each trip. Data set examples include vehicle speed, GPS location, and exception events. The vehicle devices can communicate using MQTT, an industry-standard communications protocol. Each device must be mutually authenticated prior to exchanging data and must be associated with a specific VIN. Communication is two-way between the telemetry devices, because they both supply data and can accept messages.

Cloud IoT Core's device manager provides a scalable and flexible solution to manage vehicles as IoT devices. Device manager registers the device’s public key during device registration which is used when the device sends messages to authorize the device’s connection and help establish secure communications. After the devices are registered, device manager monitors them and allows you to issue remote management commands to the devices. Device manager is provided through an API that allows the usage-based insurance application to easily integrate with the device management to register and deregister vehicle devices.

Data ingestion

Each low-level message from each vehicle must be transmitted, processed, and then stored for later processing and analytics. The messages are streamed at the end of each driving session or in real-time using MQTT connections. All event data is stored for use with the usage-based insurance application and detailed analytics.

Cloud IoT Core's protocol bridge provides communication with the vehicle devices using MQTT. After the vehicle endpoints are authenticated, the protocol bridge accepts messages and forwards each to Google Cloud Pub/Sub. Cloud Pub/Sub is a globally scalable message queueing system, making it an excellent choice to handle the streams of vehicle data while at the same time decoupling the specifics of the backend processing implementation.

Google Cloud Dataflow is used to transform, enrich and then store telemetry data by using distributed data pipelines. Cloud Dataflow is integrated into GCP components such as Google Cloud Bigtable for storage. Cloud Bigtable provides a scalable NoSQL database service with consistent low latency and high throughput, making it an ideal choice for storing and processing time-series vehicle data. The enriched, raw device data is initially stored in Cloud Bigtable for later application and analytical data processing in Google BigQuery.

Two-way communication

Bidirectional communication with the vehicle devices can be used in many ways. Examples include to request data that hasn't yet been uploaded, or to provide configuration updates for the device such as the types of events and data to report. Based on the type of insurance, vehicles might limit the data they report to exception events such as abrupt acceleration/deceleration, or they might send all possible data. The configuration updates to the devices cause the device to transmit the requested data types for the given vehicle.

Cloud IoT device manager provides the ability to establish two-way communication with managed devices through device configuration updates. By updating the configuration associated with the specific device, device manager pushes the configuration changes to the vehicle's device.

Data analytics

Vehicle data is aggregated and then combined with corporate data about the vehicle and customer's policy. The data volume scales with the number of vehicles. The enriched data is stored and then sent to the usage-based insurance application to apply business-level rules to customer accounts based on results of the analytics.

Cloud Dataflow provides the ability to process data pipelines that combine the vehicle device data with the corporate vehicle and customer data, and then store the combined data in BigQuery. BigQuery provides the powerful analytics engine for the usage-based insurance application and for out-of-band system analytics. Both types of analysis are important: the business process analytics to support the complex business rules of the usage-based insurance application and the system analytics to gain insights into overall system behavior.

Machine learning

As data volumes grow and data analytics are performed on the incoming data, it is possible to build machine learning models to algorithmically associate driver behaviors and past insurance claims to better classify driver risk profile. Deep neural networks allow models to use hundreds or thousands of different input signals to either classify into risk categories or produce a risk score probability. Over time, these models can allow the insurer to offer more tailored policies, offer drivers premiums based on their driving habits, and offer safe-driving tips to drivers.

TensorFlow and Cloud Machine Learning (ML) Engine provide a sophisticated modeling framework and scalable execution environment. TensorFlow is a popular machine learning framework originally developed by Google Brain and subsequently released as open source on GitHub. TensorFlow is used to develop custom deep neural network models and is optimized for performance, flexibility, and scale, all of which are critical when leveraging IoT-generated data. One of the challenges with using machine learning at scale is securing the computing power necessary to process models. Cloud ML Engine provides a scalable environment in which to train TensorFlow models by using specialized Google computing infrastructure hardware, including GPUs. Cloud ML Engine can scale to rapidly train models that otherwise may have taken days. After the model is trained, Cloud ML Engine can use the model to make predictions either in batch or in real-time, making the service extremely flexible.

In our usage-based insurance use case, TensorFlow is used to create models with deep neural networks trained using the data received from the vehicle. The outputs of these models include the probability of a claim for a given vehicle and the predicted magnitude of the claim. This data is used as a signal in usage-based insurance risk modeling.

Application and presentation

The core application is a usage-based insurance application which evaluates the incoming vehicle data against business process rules to determine what policy pricing is provided. A component of the application integrates with the device management function when a customer signs up or discontinues service for usage-based insurance. Many existing systems such as customer, vehicle, and policy data exist in a corporate data center or on-premises and are integrated as a part of the usage-based insurance application.

GCP provides a range of compute options including virtual machines through Google Compute Engine, containers through Google Container Engine and platform-as-a-service with Google App Engine. Container Engine is used to run and manage containers, which provide flexibility and high performance for the core functionality of the application. Compute Engine offers a range of different machine types that make it an ideal service for the integration components of the application architecture. Running backend services on standard containers provides a high degree of flexibility while taking advantage of scalability offered by a microservices architecture. App Engine is used to provide the consumer mobile and web application frontend services for its scalability, integrated services and simplicity for serving both web and mobile clients.

Next steps

Send feedback about...