Deploying and operating cloud-based 5G networks
Communication services providers (CSPs) are experiencing a period of disruption. Overall revenue growth is decelerating and is projected to remain below 1 percent per year, following a trend that started even before the pandemic.1 At the same time, driven by the pandemic, data consumption in 2020 increased by 30 percent relative to 2019, with some operators even reporting increases of 60 percent.2
The combination of pressure on revenues with rising data traffic costs is forcing operators to innovate in three fundamental ways. First, operators are looking to establish new sources of revenue. Second, increased network utilization must be met with a reduction in network cost. And third, there is an opportunity to gain new customers by improving the customer experience.
Fortunately, 5G offers a path forward across each of these three areas. Concepts such as network slicing and private networks allow CSPs to offer differentiated network services to public sector and enterprise customers. The disaggregation of hardware and software allows new vendors with unique strengths to enter the market and to enable CSPs to build, deploy, and operate networks in fundamentally new ways. And the ability to place workloads at the edge permits CSPs to offer compelling experiences to consumers and businesses alike. In this blog, we will discuss how CSPs can create a solid foundation for their cloud networks.
Understanding telecommunications networks
First, it is useful to consider the way telecommunications networks were traditionally built. Initially, networks were built using physical network functions (PNFs) — appliances that used a tight combination of hardware and software to perform a specific function. PNFs offered the benefit of being purpose-built for a specific application, but they were inflexible and difficult to upgrade. As an example, deploying new features frequently required replacing the entire PNF, i.e., deploying a new hardware appliance.
The first step in improving deployment agility came with the concept of virtualized network functions (VNFs), software workloads designed to operate on commercial off-the-shelf (COTS) hardware. Rather than utilizing an integrated hardware and software appliance, VNFs disaggregated the hardware from the software. As such, it became possible to procure the hardware from one vendor and the software from another. It also became possible to separate the hardware and software upgrade cycles.
However, while VNFs offered advantages over PNFs, VNFs were still an intermediate step. First, they typically needed to be run within a virtual machine (VM), and as such required a hypervisor to interface between the host operating system (OS) and the guest OS inside the VM. The hypervisor consumed CPU cycles and added inefficiency. Second, the VNF itself was frequently designed as a monolithic function. This meant that while it was possible to upgrade the VNF separately from the hardware, such an upgrade, even for a feature that affected only a portion of the VNF, required deployment of the entire large VNF. This created risk and operational complexity, which in turn meant that upgrades were delayed just as they were with PNFs.
Creating the foundation for cloud networks
The trick to establishing your cloud based network resides in the challenge of moving from VNFs to containerized network functions (CNFs) — network functions organized as containers as a collection of small programs, each of which can be independently operated.
The concept of containers is not new. In fact, Google has been using containerized workloads for over 15 years. Kubernetes, which Google developed and open-sourced, is the world’s most popular container orchestration system, and is based on Borg, Google’s internal container management system.3 There are lots of benefits to using containers, but fundamentally, it frees developers from worrying about resource scheduling, interprocess communication, security, self-healing, load balancing, and many other tedious (but important!) tasks.
Consider just a couple examples of benefits that containerization brings to network functions. First, when upgrading the network function to implement new features, you no longer need to re-deploy the entire network function. Instead, you only need to re-deploy the containers that are affected by the upgrade. This improves developer velocity and reduces the risk of the upgrade because, rather than infrequent upgrades that each introduce substantial changes, you can now have frequent upgrades that each deploy small changes. Small changes are less risky because they are easier to understand and to roll back in case of anomaly. Incidentally, this also improves your security posture because it reduces the time between when a security vulnerability is discovered and when a patch is deployed.
Speaking of security, another example of the benefits that containerization brings to network functions is an automatic zero-trust security posture. In Kubernetes, the communication among microservices can be handled by a service mesh, which manages mundane aspects of inter-services communication such as retries in case of failure and providing observability into communication. It can also manage other essential aspects such as security. For example, Anthos Service Mesh, which is a fully-managed implementation of the open-source Istio service mesh (also co-developed by Google), includes the ability to authenticate and encrypt all communications using mutual TLS (mTLS) and to deploy fine-grained access control for each individual microservice.
Automation and orchestration for cloud networks
CNFs bring tremendous benefits, but they also bring challenges. In place of a relatively small number of network appliances, we now have a large number of containers, each of which requires configuration, management, and maintenance. In the past, many of these processes were accomplished using manual techniques, but this is impossible to accomplish economically and reliably at the scale required by CNFs.
Fortunately, there are cloud-native approaches to solving these challenges. First, consider the problem of autonomously deploying and maintaining CNFs. The ideal way is to use the concept of Configuration as Data. Unlike imperative techniques such as Infrastructure as Code, which provide a detailed description of a sequence of steps that need to be executed to achieve an objective, Configuration as Data is a declarative method whereby the user specifies the desired end state (i.e., the actual desired configuration) and relies on automated controllers to continuously drive the infrastructure to achieve that state. Kubernetes includes such automated controllers, and the great news is that this method can be used not just for infrastructure but also for the applications residing on top of it, including CNFs. This cloud-native technique frees you from the toil and associated risk of writing detailed configuration procedures, so you can focus on the business logic of your applications.
As another example, consider the problem of understanding your network performance, including anomaly detection, root cause analysis, and resolution. The cloud-native approach starts with creating a data platform where both infrastructure and CNF monitoring data can be ingested, regularized, processed, and stored. You can then correlate data sets against each other to detect anomalies, and with AI/ML techniques, you can even anticipate anomalies before they happen. AI/ML is likewise indispensable in gaining an understanding of why the anomaly is happening, i.e. performing root cause analysis, and automated closed-loop controllers can be developed to correct the problem, ideally before it even happens.
Architecting for the edge
The transition from VNFs to CNFs is a critical piece in addressing the challenge that CSPs face today, but it alone is not enough. CNFs need infrastructure to run on, and not all infrastructure is created equal.
Consider a typical 5G network. There are some functions, such as those associated with an access network, that need to be deployed at the edge. These functions require low latency, high throughput, or even a combination of the two. In 5G networks, examples of such functions include the radio unit (RU), distributed unit (DU), centralized unit (CU), and the user plane function (UPF). The first three are components of the radio access network (RAN), while the last is a component of the 5G core. At the same time, there are some other control plane functions such as the session management function (SMF) or the authentication and mobility management function (AMF) that do not have such tight latency and high throughput requirements and can thus be placed in a more centralized data center. Furthermore, consider an AI/ML use case where a particular model (perhaps for radio traffic steering) needs to run at the network edge because of its latency requirements. While the model itself needs to run at the edge, model training (i.e., generating the model coefficients) is frequently a compute-intensive exercise that is latency-insensitive and is thus more optimal to run in a public cloud region.
All of these use cases have one thing in common: they call for a hybrid deployment environment. Some applications must be deployed at the edge as close to the user as possible. Others can be deployed in a more centralized environment. Still others can be deployed in a public cloud region to take advantage of the large amount of compute and economies of scale available therein. Wouldn’t it be convenient — if not transformational — if you could use a single environment for deploying at the edge, in a private datacenter, and in public cloud, with a consistent set of security, lifecycle management, policy, and orchestration resources across all such locations? This is indeed what Google Distributed Cloud, enabled by Anthos, brings to the table.
With Google Distributed Cloud, you can architect a 5G network deployment such as the one shown below.
Business benefits of cloud networks
Beyond the technical benefits, consider the business benefits of such an architecture. First, by following the best practices of hardware and software disaggregation, it permits the CSP to procure the infrastructure and the network functions from different vendors, spurring competition among vendors. Second, each workload is placed in precisely the right location, enabling efficient utilization of hardware resources and offering compelling low-latency, high-throughput services to users. Third, because the architecture utilizes a common hybrid platform (Anthos), it makes it easy to move workloads across infrastructure locations. Fourth, the separation of workloads into microservices accelerates time-to-market when developing new features or applications, such as those enabling enterprise use cases. And finally, the container management platform supports the simultaneous deployment of both network functions and edge applications on the same infrastructure, allowing the operator to deploy new experiences such as AR/VR directly on bare metal as close to the user as possible.
The next generation cloud network is now
There is a lot more we could say, but perhaps the most important takeaway is that this architecture is not a future dream. It exists today, and Google is working with leading CSPs and network vendor partners to deploy it, helping them realize the promise of 5G to deliver new revenues, reduce operating costs, and enable new customer experiences.
To learn more, watch the video series on the cloudification of CSP networks.
Discover what’s happening at the edge: How CSPs Can Innovate at the Edge.