Edge computing—challenges and opportunities for enterprise cloud architects
Joshua Landman
Customer Engineer, Application Modernization Specialist, Google
Praveen Rajagopalan
Customer Engineer, Application Modernization Specialist, Google
As we explored in part one of this series, today’s edge computing environments represent an enormous opportunity for enterprises — both in terms of the new use cases they enable, as well as opportunities to reduce costs. But edge computing is also a sea change in terms of how to architect applications. When thinking about building a new edge application, there are a few challenges to keep in mind.
1. Intermittent connectivity
More often than not, an edge device is disconnected from the data center. Whether it’s a sensor on a factory floor, or a connected vehicle, cloud architects cannot always assume reliable, fast network connectivity for their edge devices.
This leads to some important design considerations. For one, you should be able to isolate connectivity within the remote location. Practically speaking, all the edge devices on that network may only have a single route to the local private services cluster, and the link from the private services cluster to the mothership (cloud or corporate data center aggregation hop) may be down. That means that the remote edge system should be designed to be fault tolerant; it should work independently of its connection to the core.
2. Disconnectedness and data capture
When site network outages at the edge happen, they can have downstream impacts.
For example, imagine local video cameras that connect and save captures to an autoscaling containerized service deployment locally before transmitting back to the cloud. When many cameras are active, pod receivers spin up and write to disk. But the local cluster that prepares data for transport may only be able to send data back to the main corporate data center or cloud at specific planned times or after considerable local filtering workloads is applied. There needs to be a strategy in place to ensure that edge disks do not fill up capturing video in the event of a long gap between syncing data.
In this way, the edge location shows similar patterns to a cache — to avoid a system failure, there needs to be a time to live (TTL) on capture data in the event of a long period of isolation to avoid a system failure. This is particularly important because most edge installations aren’t monitored in real time, and so must be able to function in a distressed state. The broader system also needs to tolerate gaps in data capture to support unexpectedly long periods of disconnectedness.
Tertiary knock-on effects can bring unanticipated failures at edge sites. So in addition to planning for failure, architects also need to plan during the discovery phase for how to deploy to and configure remote systems when they are back online.
3. Hardware failure and serviceability
Hardware and upgrade failures happen: An upgrade to edge devices or services can fail, leaving clusters or devices in a failed state. Disks can experience hard bad sectors. NICs can fail. Power supplies burn out. All of these can lead to the need for physical servicing. While you can design your application to limit the blast radius of a failure, you also need service schedules and action plans for your edge environments; you cannot expect a store manager to be trained on maintaining failed edge hardware. Edge cluster pairs and blue-green style deployments across fleets should be considered; and the expectation should be set that remote site visits need to be part of an edge program.
4. Complex fleet management and configuration
With edge architecture evolving and maturing so quickly, tools to facilitate configuration management are the key to success — particularly ones that deal gracefully with instability and intermittent connectivity. This is critical for tasks like pushing out system configuration, software or security updates, instructions to pull new software, and to deploy new/refreshed models or algorithm updates when performing processing work. To manage a remote fleet, there also needs to be a minimum guarantee that it will be able to connect at some point. Remote locations that never reconnect are no longer edges, and must be physically managed, typIcally by a third-party.
The enterprise advantage
There are a whole lot of challenges for enterprise architects to create effective edge applications – but there are some things that make enterprises particularly suited to using edge computing too.
For one thing, in most enterprise scenarios, we are usually talking about private services at the edge. Why? Well, the services enterprises expose to the public are usually in core data centers or the cloud, complete with load balancers, redundancy and autoscaling. Those public services frequently require four nines of uptime or better. However, most edge services don’t require those service level objectives. Edge applications are more likely private, used to capture local telemetry or distribute processing power to remote locations, and the enterprise will continue to function perfectly if an edge service is down. For example, you might use the edge to collect and filter data, send it back to a central ML engine to build models at a cloud provider, and then receive and serve those models remotely. There’s rarely a need for public IPs or bare metal load balancers, and the endpoints for modern edge services have minimal internet bandwidth requirements. In fact, they may only need to be connected periodically! (As a corollary, public services are generally not a good use case for edge services.)
Relatedly, with edge, you can run micro clusters or even single nodes using commodity hardware. Architects tend to design data center environments for three or four nines of uptime, complete with fat network pipes and full monitoring. Not so for edge locations, particularly if there is a large number of them. Edge architecture plans for intermittent failure — we accept and design for a greater frequency of hardware failure at the edge than would be tolerated in a core data center or cloud provider. And once that kind of fault-tolerant design is in place, it stands to reason that the hardware itself can afford to be a little bit more failure-prone.
Edge challenges = business opportunities
When it comes to a certain class of enterprise applications, the edge’s weaknesses can be its strengths. If you have an application that is not interdependent on other apps, and that can function for extended periods of time disconnected from core data center system functions, then implementing an edge architecture with remote services can be an important lever for driving digital transformation down into heretofore undigitized parts of the organization.
In our next post, we’ll show you some of the Google Cloud tools and techniques you can use as part of this architecture.