Google Cloud Big Data and Machine Learning Blog

Innovation in data processing and machine learning technology

Securing cloud-connected devices with Cloud IoT and Microchip

Thursday, May 31, 2018

By Antony Passemard, Head of Product, Cloud IoT

Maintaining the security of your products, devices, and live code is a perpetual necessity. Every year, researchers (and hackers!) unearth some new flaw. Occasionally, they prove to be especially worrisome, like the “Meltdown” and “Spectre” vulnerabilities discovered by Google’s Project Zero team at the end of 2017.

Many companies believe they are too small or too inconsequential to ever be a target, but in the case of a distributed denial of service attacks (DDoS) for example, hacks will exploit random hosts (as many as possible) to hit a specific target. Regardless of who owns the site, the attacker will try to use all available local resources to do some damage. This can be compute and bandwidth resources, or exposing assets or personal information about users. The “Mirai” attack on IoT devices didn’t target anyone in particular, but aimed to take over connected devices in order to deploy them in rogue and massively distributed denial of service (DDoS) attacks.

Security cannot be an afterthought. The best course of action from any company building connected devices is to apply a combination of strong identity, encryption, and access control. In the world of IoT, this process is not as simple as it sounds.

Here we present the story of Acme, a hypothetical company planning to launch a new generation of connected devices.

Acme has several work streams for its project: mechanical design, PCB design, supply chain, firmware development, network connectivity, cloud back-end, mobile and web applications, data processing and analytics, and support. Let’s look at what each of these workstreams demand in terms of security, starting with the application layer.

Application layer security

At this layer, where the backend and user applications are delivered, the security models are well understood—access controls via permissions, roles, strong passwords, encryption in transit and at rest, logging, and monitoring all provide a very good set of security measures. The main problem today is deciding how a company should best get its data into the cloud securely.

Data encryption

Encryption starts with Transport Layer Security (TLS), which ensures that traffic between two parties is indecipherable to any potential eavesdropper. TLS is used very commonly for accessing websites—your bank’s site included—to ensure encryption of all transmitted data, keeping it safe from any prying eyes. Understandably, Acme wants to implement TLS for its devices as well as its services.

There is a trick—when you connect to your bank, the TLS session is only authenticating the bank, not you or your machine. Once you have the TLS in place, you typically enter a username and password. That password can be changed and it is stored in your head (please, don’t keep a sticky note reminder below your keyboard). The fact that you have to use your head to put in the password is proof for the verifier that you are physically present at the other end of the connection. It’s says: “Here I am, and here is my password,” but your device is not a person. A device sending a password proves that it has the password, but not that it is actually the expected device trying to authenticate. It’s similar to someone stealing your sticky note with your password on it. To address this issue, Acme will install certificates on its devices.

A certificate uses asymmetric cryptography, which implies a separation of roles. The party issuing the certificate (the Certificate Authority) guarantees the link between the physical device and the public key. Having the public key alone is insufficient. Furthermore, the verifier never gets anything of value (like the password) to be able to authenticate the entity (device). This is in fact a much higher level of security, but unfortunately it brings a level of complexity into the picture. The good news is that machines are good at both automating repetitive tasks and handling complexity.

Device identity

How does Acme use certificates for its devices? It needs its own Certificate Authority (CA). Acme can buy a root certificate from a CA provider and create its Authority. The CA has a root certificate and private key that has to be closely guarded—in the digital era, this is the key to the kingdom. That key can be used to generate an intermediate CA with the purpose of signing others keys, for example, the connected devices. If the root key is compromised, the entire security chain is compromised. If an intermediary CA is compromised, it’s not good, but remediation steps can be taken, like revoking all certificates generated by that CA, and a new intermediary CA can be generated. Acme is aware of how difficult it is to protect the root key and decides to buy that service from a company specialized in that regard.

Manufacturing security

Now that Acme and its engineering team have a CA, they can generate certificates for their devices. These are called “key pairs”—a set of private keys and corresponding public keys alongside a unique certificate for each device. These certificates need to be put on each device. This is where friction enters the process. Acme, after validating a final hardware design for its device, has found an ODM (Original Design Manufacturer) in China capable of producing them at a reasonable price.

Acme asks the ODM that during the manufacturing, each device is flashed with its unique key pair and certificate. The ODM replies that this will be a custom flashing per device and will add dozens of cents to each product. Indeed, custom flashing is expensive. This increase wasn’t really planned for Acme, but security is too important and they decide to move forward with the extra cost.

To get the certificates to the ODM, Acme has two choices: (1) Send a big file with all the keys and certificates to the ODM, or (2) Have an API that can be called during manufacturing, so the ODM can retrieve the certificates at the time of flashing. The ODM pushes back on the second option because their manufacturing plant is not connected to the internet for security purposes. Even if it were, each API call would drastically slow down the manufacturing process, and those calls would have to be extremely reliable so that there is no failure. The calls would have to be highly secure, even requiring a certificate based authentication between the manufacturing plant and the API endpoint. Furthermore, regulations in China do not allow fully encrypted tunnels in and out of the country. The only option seems to be to send a file.

The risk of doing this is obvious. A file can be easily copied, which unfortunately happens frequently. Acme needs to trust the manufacturer to not set aside a few of those certificates, and to not release copies of the devices themselves that would be indistinguishable from the real ones. (Except for price, of course!) Every new batch will require a new file, and new opportunities for a copy to leak.

Authentication

Let’s assume for now that the ODM is trustworthy, which is in fact often the case. The device will have to use the certificate to authenticate itself with the cloud endpoint and establish the encrypted channel prior to operation. Just to say hello securely, the device first needs to open a secure pipe (over TLS), and then needs to use that pipe for the cloud and device to mutually authenticate each endpoint’s respective identity. This process requires both the device and the cloud endpoint to have the public key of the other party. Public keys of all devices connecting to the cloud endpoint will have to be uploaded to the cloud at one point or another before the authentication happens.

To perform the mutual authentication, the device will have to store its private key, public key, a TLS stack with mutual authentication, a certificate with the public key of the endpoint to connect to in the Cloud to establish the first call to the cloud securely. All of a sudden, the memory requirements on the device becomes a problem. That minimum stack is in the order of a few hundred kilobytes. Acme didn’t plan on that much. The devices have simple command and control systems and a few sensors in it. The non-volatile storage capacity of the device is well under 100 kB and is insufficient. Acme will need to move to a more powerful architecture and add costs to the original design.

Secure storage and secure boot

With more memory (and added costs), Acme is now looking for the best way to store the private key securely. Indeed, what use is a private key if someone can access the device firmware by physically hacking into it or remotely take control and retrieving the private key? Doing so will allow the attacker to copy the private key and start connecting to the cloud endpoint and access data it’s not supposed to.

In case the device is compromised, the firmware of the device can be modified, which is exactly what Mirai does, and it can be used for other purposes than what it was intended for. Validating the firmware through a signature verification is critical to ensure what runs on the device is valid before it even boots the firmware. There is no way to prevent a modification of this signature if the validation is not in a separate memory location from the firmware itself.

Rotating keys

Similarly to how a user changes their password from time to time to reduce the window of opportunity for an attacker to use a compromised password, devices need to be able to rotate their keys. That rotation is not as simple as getting a new key. Imagine the cloud system tells the device to change its keys. The cloud can generate a new pair, the device can download it securely using the old key. The cloud then invalidates the old public key for the device and replaces it with the new one. You have to hope that the device will be able to update its key pair at this stage, because if not, the user will end up with a brick. It is critical that several keys can be used simultaneously for a single device to enable a rotation of keys and enable reverting to a working state in case the process fails.

Summary of the situation

The cost of securing the device has skyrocketed for Acme, as well as the complexity to implement and maintain a high level of security. Let’s summarize:

  1. Acme needs certificates and therefore a Certificate Authority that needs to be protected with the highest level of care.
  2. The cost of burning those keys in the device is a balance between dollar amounts
    (and finding an appropriate ODM), and the risk of credentials being compromised (copied) during manufacture.
  3. Acme will need to use TLS to secure the communication which now requires a bloated TLS stack on the device and a larger memory footprint than they anticipated. These resource demands increase  after you integrate Online Certificate Status Protocol (OCSP for the broker), which requires additional (memory-consuming) keys and (CPU-consuming) requests.
  4. Keys are extremely difficult, if not impossible to store securely in the firmware.
  5. Secure boot to stop the device from running a compromised firmware is impossible without a separate secure storage.
  6. Refreshing keys requires the ability from the cloud solution to store several identities in order to have a failsafe.

At Google, we have given a hard look at this situation, and we believe we have come up with a solution that can serve companies like Acme very well. The main demonstration of this solution is through our partnership with Microchip.

Step 1: Use a secure element

A secure element is a piece of hardware that can securely store key material. It usually comes with anti-tampering capabilities which will block all attempts to physically hack the device and retrieve the keys.

All IoT devices should have a secure element. It is the only way to secure the storage of the private key. All secure elements will do that well, but some secure elements will do more. For example, the Microchip ATECC608A cryptographic coprocessor chip will not only store the private keys, it will also validate the firmware and offer a more secure boot process for the device.

Microchip ATECC608A

The ATECC608A offers even more features. For example, the private key is generated by the secure element itself, not an external party (CA). The chip uses a random number generator to create the key, making it virtually impossible to derive. The private key never leaves the chip, ever. Using the private key, the chip will be able to generate a public key that can be signed by the chosen CA of the company.

Microchip performs this signature in a dedicated secure facility in the US, where an isolated plant will store the customer’s intermediate CA keys in a highly secure server plugged into the manufacturing line. The key pairs and certificates are all generated in this line in a regulatory environment which allows auditing and a high level of encryption.

Once the secure elements have each generated their key pairs, the corresponding public keys are sent to the customer’s Google Cloud account and stored securely in the Cloud IoT Core device manager. Because Cloud IoT Core can store up to 3 public keys per device, key rotation can be performed with failsafe without issues.

All the customer has to do is provide an intermediary CA for a given batch of devices to Microchip, and they will return a roll of secure elements. These rolls can be sent to any manufacturer to be soldered onto the final PCB at high speed, with no customization, no risk of copy, and very low cost.

Step 2: Using a JWT for authentication

Using TLS is perfect for securing the communication between the device and the cloud, but the authentication stack is not ideal for IoT. The stack required for mutual authentication is large in size and has a downside: it needs to be aware of where the keys are stored. The TLS stack needs to know what secure element is used and how to communicate with it. An OpenSSL stack will assume the keys are stored in a file system and need to be modified to access the secure element. This requires development and testing that has to be done again at each update of the stack. With TLS 1.3 coming up, it is likely that this work will have to happen several times, which is a cost for the company. The company can use a TLS stack that is already compatible with the secure element, like WolfSSL, but there is a licensing cost involved that adds to the cost of the device.

Google Cloud IoT is using a very common JWT (JSON Web Token) to authenticate the device instead of relying on the mutual authentication of a TLS stack.

The device will establish a secure connection to the global cloud endpoint for Cloud IoT Core (mqtt.googleapis.com) using TLS, but instead of triggering the mutual authentication it will generate a very simple JWT, sign it with its private key and pass it as a password. The Microchip ATECC608 offers a simple interface to sign the device JWT securely without ever exposing the private key. The JWT is received by Google Cloud IoT, the public key for the device is retrieved and used to verify the JWT signature. If valid, the mutual authentication is effectively established. The JWT validation can be set by the customer but never exceeds 24 hours, making it very ephemeral.

Secure flow with Microchip and Cloud IoT’s Device Manager

There are several benefits to this approach:

  1. There is no dependency on the TLS stack used to perform the device authentication. Updating the TLS stack to 1.3 will be a breeze.
  2. The devices do not need to store their public key and certificate, which releases a significant portion of memory on the device.
  3. The device does not need to host a full TLS stack, which again releases memory for the application.
  4. The memory requirements are well under 50KB, which opens the door to using a much smaller MCU (microcontroller unit).

With these two steps, the full complexity of handling certificates is removed and customers can focus on their product and customer experience.

Conclusion

Security is complex, and as we alluded to in the introduction, it cannot be an afterthought. Fortunately, with the use of the JWT authentication scheme, and the partnership with Microchip around the ATECC608, security is turned into a simple BOM item. Google and Microchip even agreed on a discounted price of around 50 cents. This means customers pay less than a dollar to not only bring increased security to the provisioning of identity, authentication, and encryption, but also to free up a large amount of space on the device, enabling smaller and cheaper MCUs to work in the final design.

The chip can even be retrofitted into existing designs as a companion chip since the secure element communicates easily over I2C. We hope you’ll consider integrating the ATECC608 in every IoT design you are looking into.

To learn more, take a look at the following links:

We’ll also be presenting our work around IoT and security at Google Cloud’s NEXT 2018 event  on July 24-26 in San Francisco. Here are a couple sessions you might be interested in:

  • Big Data Solutions

  • Product deep dives, technical comparisons, how-to's and tips and tricks for using the latest data processing and machine learning technologies.

  • Learn More

12 Months FREE TRIAL

Try BigQuery, Machine Learning and other cloud products and get $300 free credit to spend over 12 months.

TRY IT FREE