Google Kubernetes Engine reliability guide

Google Kubernetes Engine (GKE) is a system for operating containerized applications in the cloud, at scale. GKE deploys, manages, and provisions resources for your containerized applications. The GKE environment consists of Compute Engine instances grouped together to form a cluster.

Best practices

  • Preparing Google Kubernetes Engine for production
    • Detailed guidance for onboarding your containerized workloads to GKE.
    • Multi-zonal versus multi-regional deployments for high availability architectures.
    • Tips for automating common cluster operations such as provisioning, node repair, and scaling up under load.
    • Using a service mesh to implement reliability features such as traffic management, observability, canary testing, and chaos testing with minimal engineering investment.
  • Best practices for operating containers - how to use logging mechanisms, ensure containers are stateless and immutable, monitor applications, and do liveness and readiness probes.
  • Best practices for building containers - how to package a single application per container, handle process identifiers (PIDs), optimize for the Docker build cache, and build smaller images for faster upload and download times.