Kubernetes users, get ready for the next chapter in microservices management


If you’re a developer, containers can feel kind of like magic—they’re portable, efficient, and make it a breeze to spin up new applications. But while containers are great for simple applications, you need extra support as you build them out into larger applications and services. When containers arrived on the scene a few years ago, IT organizations were really struggling to manage them. Deploying containers to their allotted infrastructure was a relatively manual task, with little support for auto-scaling that infrastructure up and down quickly and efficiently.  

By 2014, we’d been running Borg, our internal container resource manager, for ten years, and thought that putting a version of it in the hands of the open-source community could really help. Four years later, Kubernetes has become the dominant container orchestration platform. According to IDC1, 43% of enterprises already use public container services. We have seen evidence to that effect in our own offering, Google Kubernetes Engine (GKE). More than 80% of our largest customers already use GKE to run their workloads in production, and over 40% of GKE clusters are running stateful workloads.   

Adopting Kubernetes brings its share of benefits: resources are used more efficiently and availability improves. Developers can deploy new applications more frequently, increasing the velocity with which they create new software. Kubernetes has also introduced an extensible declarative config framework that is ushering in a consistent operation model for all services—even third-party services. More immediately, the combination of Kubernetes and containers has allowed developers to adopt a microservices architecture—discrete units of software inside single-purpose services that you can then knit into large, distributed applications that run across hybrid cloud and on-prem environments.

The missing management layer

We also know from our experience running global infrastructure that although container orchestration is a critical IT function, it’s not enough. Especially if you’re running a distributed application of any size, which might decompose a single monolithic application into hundreds or even thousands of individual components. At some point, you need tools to manage the collection of microservices, and to ensure consistent policies across them. More importantly, these policies need to be decoupled from the individual services, so that they can be more uniform and updated independently of the services.

At Google, we realized years ago that to run our services reliably we needed a common system for monitoring, logging, authorization, and billing, so that individual teams didn’t have to solve these problems themselves, differently every time. Our solution was to use Stubby, a remote procedure call mechanism, plus control-plane functionality via a sidecar proxy. With this combination, we get the benefits of a service mesh: automated telemetry,  mutual service authentication and encryption, client-side load balancing, advanced networking, as well as language-independent integration of our backend systems.The sidecar + control plane model was designed specifically to allow us to address the needs both of internal services and external, public APIs (which had previously required different mechanisms) with one platform.

Our internal service platform gives us uniformity across our services; our Site Reliability Engineers (SREs) get standard dashboards for every service without having to manually integrate monitoring tools. Similarly, services are automatically integrated with our authentication and policy engines (for example, for quota and rate limiting). Now, developers don’t need to reinvent the wheel for every new service they create, and SREs have a standard way to operate them.

Answering the call with Istio

As large Kubernetes shops deploy more microservices, we believe that they’ll need similar capabilities. To meet that need, we teamed up last year with Lyft and IBM on Istio, an open-source service mesh for today’s hybrid, distributed applications. Istio offers visibility in the form of telemetry for monitoring and logs for your services, plus security by giving each service a strong identity based on its role, as well as enabling encryption by default. With that core functionality place, Istio can also be the basis for higher-level services, e.g., helping to enforce network security policies, or controlling software rollouts through canary deployments.

Istio also ensures a proper decoupling between development and operations, allowing operations teams to change the behavior of the system without actually changing the source code. One example is retry policy: imagine a system in which a key microservice begins to display abnormally high latency. If the services dependent on that microservice have an aggressive retry policy, they can quickly oversaturate the lagging service, thus making the problem worse. Istio gives the operations team the ability to change that retry policy—having the dependent systems back off—without changing the source code and without redeploying it. Ops teams can also change circuit-breaking policies, redirect traffic, run canary deployments and more.

This decoupling of development and operations logic that Istio provides accomplishes two things: it allows your developers to focus on writing business logic, not infrastructure (thus making them more productive), and it gives your operations teams the tools they need to run your applications and services more reliably.

Already, the promise of Istio is resonating with big Kubernetes users including Descartes Labs, eBay, and AutoTrader UK. Descartes Labs, for example, uses machine learning for its predictive intelligence service, using APIs running in GKE clusters. Kubernetes gave them the ability to scale up and down with demand, but because their application has so many microservices and dependencies, finding performance problems was difficult. When one service was overloaded, it could take hours of work combing through logs to find the problem. Deploying Istio gave them clarity. For over a year now, Istio has let them see where the traffic to any given service is coming from, so they can resolve problems much more quickly.

“Istio was a missing piece in the Kubernetes ecosystem. Kubernetes gave us the ability to distribute an application, but Istio gave us the ability to understand the application,” says Tim Kelton, a Descartes Labs co-founder. Now that they can better observe their app, they’re beginning to adopt some of Istio’s advanced networking capabilities as well.

The future is now

Using Istio is about to get a whole lot easier for Google Cloud users, with Istio on GKE. Coming to beta next month, and with dozens of early access customers already running it in production, Istio on GKE layers a service mesh onto existing GKE clusters, and gathers telemetry about the containers running therein. That telemetry gets sent to Stackdriver or Prometheus, so you can monitor the health of the services running in your containers the same way that Google SREs do—by monitoring the so-called Golden Signals (traffic, error rates, and latencies). You can also audit dependencies among services or analyze performance with tracing. Perhaps best of all you can also improve your security by turning on mTLS, which encrypts all data in transit between your services. You can set all this in motion simply by checking the “Enable Istio” box in the GKE management console.

Overall, Istio on GKE is just one way we’ll integrate Istio into our computing portfolio. In the coming months, look for us to integrate Istio into broad swaths of our offerings, as part of the Cloud Services Platform. For instance, this summer, we announced our plans for GKE On-Prem, a binary-compatible version of GKE that you can run in your own private data center. You’ll of course want visibility into your serverless applications, so look for Istio there as well. As with Kubernetes, we want you to be able to run Istio however you see fit—on your own, or as a hands-off managed service.

Keybank, a major US financial institution, elected to collaborate with us for these very reasons. “Google created Kubernetes and Istio so they were the obvious cloud to partner with as we look to bring containerized applications into our data centers. Put simply, the Cloud Services Platform provides us the security we need, the portability we want, and the productivity that our developers crave,” says Keith Silvestri, Chief Technology Officer, KeyBank.

We’re not the only ones who believe in Istio. Some of the biggest names in the industry are contributing to Istio—companies like IBM, Red Hat, and VMware—all bringing their expertise and experience to bear. And because Istio is open source, you have the freedom to add the functionality you need and integrate it into your environment that much faster.

Here at Google, SREs talk a lot about avoiding “success disasters”—a service whose adoption outperforms your expectations, and thus, your ability to deliver it. Whether you’re a bonafide cloud-native, or you’re just starting to think about modernizing your IT environment, building your applications on Kubernetes and containers will take you a long way. But you also need to think about how you’ll manage an environment that exceeds your wildest expectations. Istio can take you the rest of the way, so you can plan on plain-old “success.”

1. PaaSView for the Developer: The Modern Development Experience (IDC #US44223218, Aug 2018)