Identity & Security
Zero trust workload security with GKE, Traffic Director, and CA Service
At the core of a zero trust approach to security is the idea that trust needs to be established via multiple mechanisms and continuously verified. Internally, Google has applied this thinking to the end-to-end process of running production systems and protecting workloads on cloud-native infrastructure, an approach we call BeyondProd. Establishing and verifying trust in such a system requires: 1) that each workload has a unique workload identity and credentials for authentication, and 2) an authorization layer that determines which components of the system can communicate with other components.
Consider a cloud-native architecture where apps are broken into microservices. In-process procedure calls and data-transfers become remote procedure calls (RPCs) over the network between microservices. In this scenario, a service mesh manages communications between microservices, and is a natural place to embed key controls that implement a zero trust approach. Securing RPCs is extremely important: each microservice needs to ensure that it receives RPCs only from authenticated and authorized senders, is sending RPCs only to intended recipients, and has guarantees that RPCs are not modified in transit. Therefore, the service mesh needs to provide service identities, peer authentication based on those service identities, encryption of communication between authenticated peer identities, and authorization of service-to-service communication based on the service identities (and possibly other attributes).
To provide managed service mesh security that meets these requirements, we are happy to announce the general availability of new security capabilities for Traffic Director which provide fully-managed workload credentials for Google Kubernetes Engine (GKE) via CA Service, and policy enforcement to govern workload communications. The fully-managed credential provides the foundation for expressing workload identities and securing connections between workloads leveraging mutual TLS (mTLS), while following zero trust principles.
As it stands today, the use of mTLS for service-to-service security involves considerable toil and overhead for developers, SREs, and deployment teams. Developers have to write code to load certificates and keys from pre-configured locations and use them in their service-to-service connections. They typically also have to perform additional framework or application-based security checks on those connections. Adding complexity, SREs and deployment teams have to deploy keys and certificates on all the nodes where they will be needed and track their expiry. The replacement or rotation of these certificates involves creating CSRs (certificate signing requests), getting them signed by the issuing CA, installing the signed certificates, and installing the appropriate root certificates at peer locations. The process of rotation is critical, as letting an identity or root certificate expire means an outage that can take services offline for an extended amount of time.
This security logic cannot be hardcoded because the routing of RPCs is orchestrated by the traffic control plane and, as microservices are scaled to span multiple deployment infrastructures, it is difficult for the application code to verify identities and perform authorization decisions based on them.
Our solution addresses these issues by creating seamless integrations between the Certificate Authorities’ infrastructure, the compute/deployment infrastructure, and the service mesh infrastructure. In our implementation, Certificate Authority Service (CAS) provides certificates for the service mesh, the GKE infrastructure integrates with CAS, and the Traffic Director control plane integrates with GKE to instruct data plane entities to use these certificates (and keys) for creating mTLS connections with their peers.
The GKE cluster’s mesh certificate component continuously talks to the CA pools to mint service identity certificates and make these certificates available to intended workloads running in GKE pods. Issuing Certificate Authorities are automatically renewed and the new roots pushed to clients before expiry. Traffic Director is the service mesh control plane which provides policy, configuration, and intelligence to data plane entities, and supplies configurations to the client and server applications. These configurations contain the necessary transport and application-level security information to enable the consuming services to create mTLS connections and apply the appropriate authorization policies to the RPCs that flow through those connections. Finally, workloads consume the security configuration to create the appropriate mTLS connections and apply the provided security policies.
To learn more, check out the Traffic Director user guide and see how to setup Traffic Director and the accompanying services in your environment to take a zero trust approach to securing your GKE workloads.