This document helps you plan, design, and implement your migration from a self-managed Kubernetes environment to Google Kubernetes Engine (GKE). If done incorrectly, moving apps from one environment to another can be a challenging task, so you need to plan and execute your migration carefully.
This document is part of a multi-part series about migrating to Google Cloud. If you're interested in an overview of the series, see Migration to Google Cloud: Choosing your migration path.
This document is part of a series that discusses migrating containers to Google Cloud:
- Migrating containers to Google Cloud: Getting started
- Migrating containers to Google Cloud: Migrating Kubernetes to GKE (this document)
- Migrating containers to Google Cloud: Migrating to a new GKE environment
- Migrating containers to Google Cloud: Migrating to a multi-cluster GKE environment with Multi Cluster Service Discovery and Multi Cluster Ingress
- Migrating containers to Google Cloud: Migrating from OpenShift to Anthos
This document is useful if you're planning to migrate from a self-managed Kubernetes environment to GKE. Your environment might be running in an on-premises environment, in a private hosting environment, or in another cloud provider. This document is also useful if you're evaluating the opportunity to migrate and want to explore what it might look like.
By using GKE, you get the following benefits:
- You don't have to manage control plane (master) nodes.
- You can use Google expertise for security, networking, Kubernetes upgrades, and node auto-provisioning.
- You can automatically scale your clusters by adding nodes or by tuning the CPU and memory request limits for Pods.
This document assumes that you have read and are familiar with the following tasks:
- Creating different types of GKE clusters
- Managing, configuring, and deploying GKE clusters
- Preparing a GKE environment for production
- Understanding GKE security
- Hardening your cluster's security
The following diagram illustrates the path of your migration journey.
During each migration step, you follow the phases defined in Migration to Google Cloud: Getting started:
- Assessing and discovering your workloads.
- Planning and building a foundation.
- Deploying your workloads.
- Optimizing your environment.
Assessing your environment
In the assessment phase, you determine the requirements and dependencies to migrate your self-managed Kubernetes environment to GKE:
- Build a comprehensive inventory of your apps.
- Catalog your apps according to their properties and dependencies.
- Train and educate your teams on Google Cloud.
- Build an experiment and proof of concept on Google Cloud.
- Calculate the total cost of ownership (TCO) of the target environment.
- Choose the workloads that you want to migrate first.
The following sections rely on Migration to Google Cloud: Assessing and discovering your workloads.
Build your inventories
To scope your migration, you must understand your current Kubernetes environment. You first gather information about your clusters, and then you focus on your workloads deployed in those clusters and the workloads' dependencies. At the end of the assessment phase, you have two inventories: one for your clusters, and one for the workloads deployed in those clusters.
To build the inventory of your clusters, consider the following for each cluster:
- Number and type of nodes. When you know how many nodes and the characteristics of each node that you have in your current environment, you size your clusters when you move to GKE. The nodes in your new environment might run on a different hardware-architecture generation than the ones you use in your environment. The performance of each architecture generation is different, so the number of nodes you need in your new environment might be different from your environment. Evaluate any type of hardware that you're using in your nodes, such as high-performance storage devices, GPUs, and TPUs.
- Internal or external cluster. Evaluate which actors, either internal to your environment or external, that each cluster is exposed to. To support your use cases, this evaluation includes the workloads running in the cluster, and the interfaces that interact with your clusters.
- Multi-tenancy. If you're managing multi-tenant clusters in your environment, assess if it works in your new Google Cloud environment. Now is a good time to evaluate how to improve your multi-tenant clusters because your multi-tenancy strategy influences how you build your foundation on Google Cloud.
- Kubernetes version. Gather information about the Kubernetes version of your clusters to assess if there is a mismatch between those versions and the ones available in GKE. If you're running an older or a recently released version, you might be using features that are unavailable in GKE. The features might be deprecated, or the Kubernetes version that ships them is not yet available in GKE.
- Kubernetes upgrade cycle. To maintain a reliable environment, understand how you're handling Kubernetes upgrades and how your upgrade cycle relates to GKE upgrades.
- Node pools. If you're using any form of node grouping, you might want to consider how these groupings map to the concept of node pools in GKE because your grouping criteria might not be suitable for GKE.
- Node initialization. Assess how you initialize each node before marking it as available to run your workloads so you can port those initialization procedures over to GKE.
The following items that you assess in your inventory focus on the security of your infrastructure and Kubernetes clusters:
- Namespaces. If you use Kubernetes Namespaces in your clusters to logically separate resources, assess which resources are in each Namespace, and understand why you created this separation. For example, you might be using Namespaces as part of your multi-tenancy strategy. You might have workloads deployed in Namespaces reserved for Kubernetes system components, and you might not have as much control in GKE.
- Role-based access control (RBAC). If you use RBAC authorization in your clusters, list a description of all ClusterRoles and ClusterRoleBindings that you configured in your clusters.
- Network policies. List all network policies that you configured in your clusters, and understand how network policies work in GKE.
- Pod security policies and contexts. Capture information about the PodSecurityPolicies and Pod security contexts that you configured in your clusters and learn how they work in GKE.
- Service accounts. If any process in your cluster is interacting with the Kubernetes API server, capture information about the service accounts that they're using.
After you complete the Kubernetes clusters inventory and assess the security of your environment, build the inventory of the workloads deployed in those clusters. When evaluating your workloads, gather information about the following aspects:
- Pods and controllers. To size the clusters in your new environment, assess how many instances of each workload you have deployed, and if you're using ResourceQuotas and compute resource consumption limits. Gather information about the workloads that are running on the control plane nodes of each cluster and the controllers that each workload uses. For example, how many Deployments are you using? How many DaemonSets are you using?
- Jobs and CronJobs. Your clusters and workloads might need to run Jobs or CronJobs as part of their initialization or operation procedures. Assess how many instances of Jobs and CronJobs you have deployed, and the responsibilities and completion criteria for each instance.
- Horizontal Pod Autoscalers. To migrate your autoscaling policies in the new environment, learn how the Horizontal Pod Autoscaler works on GKE.
- Stateless and stateful workloads. Stateless workloads don't store data or state in the cluster or to persistent storage. Stateful applications save data for later use. For each workload, assess which components are stateless and which are stateful, because migrating stateful workloads is typically harder than migrating stateless ones.
- Kubernetes features. From the cluster inventory, you know which Kubernetes version each cluster runs. Review the release notes of each Kubernetes version to know which features it ships and which features it deprecates. Then assess your workloads against the Kubernetes features that you need. The goal of this task is to know whether you're using deprecated features or features that are not yet available in GKE. If you find any unavailable features, migrate away from deprecated features and adopt the new ones when they're available in GKE.
- Storage. For stateful workloads, assess if they use PersistenceVolumeClaims. List any storage requirements, such as size and access mode, and how these PersistenceVolumeClaims map to PersistenceVolumes. To account for future growth, assess if you need to expand any PersistenceVolumeClaim.
- Configuration and secret injection. To avoid rebuilding your deployable artifacts every time there is a change in the configuration of your environment, inject configuration and secrets into Pods using ConfigMaps and Secrets. For each workload, assess which ConfigMaps and Secrets that workload is using, and how you're populating those objects.
- Dependencies. Your workloads probably don't work in isolation. They might have dependencies, either internal to the cluster, or from external systems. For each workload, capture the dependencies, and if your workloads have any tolerance for when the dependencies are unavailable. For example, common dependencies include distributed file systems, databases, secret distribution platforms, identity and access management systems, service discovery mechanisms, and any other external systems.
- Kubernetes Services. To expose your workloads to internal and external clients, use Services. For each Service, you need to know its type. For externally exposed services, assess how that service interacts with the rest of your infrastructure. For example, how is your infrastructure supporting LoadBalancer services and Ingress objects? Which Ingress controllers did you deploy in your clusters?
- Service mesh. If you're using a service mesh in your environment, you assess how it's configured. You also need to know how many clusters it spans, which services are part of the mesh, and how you modify the topology of the mesh. For example, are you using the auto-injection mechanism to automatically add services to the mesh?
- Taints and tolerations and affinity and anti-affinity. For each Pod and Node, assess if you configured any Node taints, Pod tolerations, or affinities to customize the scheduling of Pods in your Kubernetes clusters. These properties might also give you insights about possible non-homogeneous Node or Pod configurations, and might mean that either the Pods, the Nodes, or both need to be assessed with special focus and care. For example, if you configured a particular set of Pods to be scheduled only on certain Nodes in your Kubernetes cluster, it might mean that the Pods need specialized resources that are available only on those Nodes.
After you assess your clusters and their workloads, evaluate the rest of the supporting services and aspects in your infrastructure, such as the following:
- StorageClasses and PersistentVolumes. Assess how your infrastructure is backing PersistentVolumeClaims by listing StorageClasses for dynamic provisioning, and statically provisioned PersistentVolumes. For each PersistentVolume, consider the following: capacity, volume mode, access mode, class, reclaim policy, mount options, and node affinity.
- VolumeSnapshots and VolumeSnapshotContents. For each PersistentVolume, assess if you configured any VolumeSnapshot, and if you need to migrate any existing VolumeSnapshotContents.
- Data storage. If you depend on external systems to provision PersistentVolumes, provide a way for the workloads in your GKE environment to use those systems. Data locality has an impact on the performance of stateful workloads, because the latency between your external systems and your GKE environment is proportional to the distance between them. For each external data storage system, consider its type, such as block volumes, file storage, or object storage, and any performance and availability requirements that it needs to satisfy.
- Logging, monitoring, and tracing. Capture information on your monitoring, logging, and tracing systems. You can integrate your systems with the Google Cloud's operations suite, or you can use Google Cloud's operations suite as your only monitoring, logging, and tracing tool. For example, you can integrate Google Cloud's operations suite with other services, set up logging interfaces for your preferred programming languages, and use the Cloud Logging agent on your VMs. GKE integrates with Google Cloud's operations suite and Cloud Audit Logs. You can also customize Cloud Logging logs for GKE with Fluentd and then process logs at scale using Dataflow.
- Custom resources. Collect information about any custom Kubernetes resources that you might have deployed in your clusters, because they might not work in GKE, or you might need to modify them. For example, if a custom resource interacts with an external system, you assess if that's applicable to your Google Cloud environment.
Complete the assessment
After building the inventories related to your Kubernetes clusters and workloads, complete the rest of the activities of the assessment phase in Migration to Google Cloud: Assessing and discovering your workloads.
Planning and building your foundation
In the planning and building phase, you provision and configure the cloud infrastructure and services that support your workloads on Google Cloud:
- Build a resource hierarchy.
- Configure identity and access management.
- Set up billing.
- Set up network connectivity.
- Harden your security.
- Set up monitoring and alerting.
If you've already adopted infrastructure-as-code to manage the workloads in your Kubernetes environment, you can apply the same process to your Google Cloud environment. You analyze your Kubernetes descriptors because some Google Cloud resources that GKE automatically provisions for you are configurable by using Kubernetes labels and annotations. For example, you can provision an internal load balancer instead of an external one by adding an annotation to a LoadBalancer Service.
The following sections rely on Migration to Google Cloud: Building your foundation.
Build a resource hierarchy
To design an efficient resource hierarchy, consider how your business and organizational structures map to Google Cloud as detailed in Migration to Google Cloud: Building your foundation and Preparing a GKE environment for production.
For example, if you need a multi-tenant environment on GKE, you can choose between the following options:
- Creating one Google Cloud project for each tenant.
- Sharing one project among different tenants, and provisioning multiple GKE clusters.
- Using Kubernetes namespaces.
Your choice depends on your isolation, complexity, and scalability needs. For example, having one project per tenant isolates the tenants from one another, but the resource hierarchy becomes more complex to manage due to the high number of projects. However, although managing Kubernetes Namespaces is relatively easier than a complex resource hierarchy, this option doesn't guarantee as much isolation. For example, the control plane might be shared between tenants.
Configure identity and access management
Identity and Access Management provides the tools to centrally configure fine-grained access control to cloud resources. For more information, see Identity and Access Management and Preparing a Google GKE environment for production.
Review how Kubernetes RBAC interacts with identity and access management in Google Cloud, and configure the RBAC according to your requirements that you gathered in the assessment phase.
Set up billing
Set up network connectivity
Network configuration is a fundamental aspect of your environment. Assess the GKE network model and the connectivity requirements of your workloads. Then, you can start planning your network configuration. For more information, see connectivity and networking.
Harden your security
Understanding the differences between your environment's security model and Google Cloud's model and how to harden the security of your GKE clusters are crucial to protect your critical assets. For more information, see security.
Set up monitoring and alerting
Having a clear picture of how your infrastructure and workloads are performing is key to finding areas of improvement. GKE has deep integrations with Google Cloud's operations suite, so you get logging and monitoring information about your GKE clusters and workloads inside those clusters. For more information, see monitoring and alerting.
Deploying your workloads
In the deployment phase, you do the following:
- Provision and configure your runtime platform and environments.
- Migrate data from your old environment to your new environment.
- Deploy your workloads.
The following sections rely on Migration to Google Cloud: Transferring large datasets, Migration to Google Cloud: Deploying your workloads, and Migration to Google Cloud: Migrating from manual deployments to automated, containerized deployments.
Provision and configure your runtime platform and environments
Before moving any workload to your new Google Cloud environment, you provision the GKE clusters.
After the assessment phase, you now know how to provision the GKE clusters in your new Google Cloud environment to meet your needs. You can provision the following:
- The number of clusters, the number of nodes per cluster, the types of clusters, and the configuration of each cluster and each node.
- The number of private clusters.
- The choice between VPC-native or router-based networking.
- The Kubernetes versions that you need in your GKE clusters.
- The node pools to logically group the nodes in your GKE clusters, and if you need to automatically create node pools with node auto-provisioning.
- The initialization procedures that you can port from your environment to the GKE environment and new procedures that you can implement. For example, you can automatically bootstrap GKE nodes by implementing one or multiple, eventually privileged, initialization procedures for each node or node pool in your clusters.
Migrate data from your old environment to your new environment
Now you can transfer data that your stateful workloads need.
Migration to Google Cloud: Transferring large datasets contains guidance on this topic. If you're planning to modernize your workloads to apply a microservices architecture or if you've already adopted it, or if you need guidance about cut-over windows and scheduled maintenance strategies, see Migrating a monolithic application to microservices on GKE. For more information about the data storage options that you have on GKE, see storage configuration. For example, you can use Compute Engine persistent disks, either zonal or replicated across a region, or you can use Filestore.
You provision all necessary storage infrastructure before moving your data. If you're using any StorageClass provisioners, you configure them in the new clusters.
Deploy your workloads
To deploy your workloads, you design and implement a deployment process, according to your requirements. If you're not satisfied with your deployment processes and want to migrate to a more modern, automated process, see Migration to Google Cloud: Migrating from manual deployments to containers and automation. It contains guidance to migrate away from manual deployments to container orchestration tools and automation. The deployment phase is also a chance to modernize your workloads. For example, if you're using any Pods in your environment, consider migrating those workloads to Deployments.
When your deployment process is ready, you can deploy your workloads to GKE.
Optimizing your environment
Optimization is the last phase of your migration. In this phase, you make your environment more efficient than it was before. In this phase, you execute multiple iterations of a repeatable loop until your environment meets your optimization requirements. The steps of this repeatable loop are as follows:
- Assessing your current environment, teams, and optimization loop.
- Establishing your optimization requirements and goals.
- Optimizing your environment and your teams.
- Tuning the optimization loop.
The following sections rely on Migration to Google Cloud: Optimizing your environment.
Assess your current environment, teams, and optimization loop
While the first assessment focuses on the migration from your environment to GKE, this assessment is tailored for the optimization phase.
Establish your optimization requirements
Review the following optimization requirements for your GKE environment:
- Implement advanced deployment processes. Processes like canary deployments or blue/green deployments give you added flexibility and can increase the reliability of your environment, extend testing, and reduce the impact of any issue for your users.
- Configure a service mesh. By introducing a service mesh to your environment, you use features like observability, traffic management, and mutual authentication for your services, and reduce the strain on your DevOps teams. You can deploy a multi-cluster service mesh to better segment your workloads or an expanded service mesh to support your migration to the new environment.
- Set up automatic scaling. You have different, complementary options to automatically scale your GKE environment. You can automatically scale your clusters and the workloads inside each cluster. By configuring the cluster autoscaler, you can automatically resize a GKE cluster based on the demands of your workloads, by adding or removing worker nodes to the cluster. If you want to automatically scale your workloads, you adjust the CPU and memory consumption requests and limits with the vertical Pod autoscaler. When you use the autoscaler, you don't have to think about the values to specify for each container's CPU and memory requests.
- Reduce costs with preemptible virtual machines (VMs). If some of your workloads are tolerant to runtime environments with no availability guarantees, consider deploying those workloads in a node pool composed of preemptible VMs. Preemptible VMs are priced lower than standard Compute Engine VMs, so you can reduce the costs of your clusters.
- Integrate GKE with other products. Some Google Cloud products can integrate with GKE to harden the security of your environment. For example, you can analyze containers for vulnerabilities or use managed base images in Container Registry.
- Design your GKE clusters to be fungible. By considering your clusters as fungible and by automating their provisioning and configuration, you can streamline and generalize the operational processes to maintain them and also simplify future migrations and GKE cluster upgrades. For example, if you need to upgrade a fungible GKE cluster to a new GKE version, you can automatically provision and configure a new, upgraded cluster, automatically deploy workloads in the new cluster, and decommission the old, outdated GKE cluster.
Although you can pursue some of these optimization requirements in a Kubernetes environment, it is easier in GKE because you don't have to spend the effort to keep the cluster running. Instead, you can focus on the optimization itself.
Complete the optimization
After populating the list of your optimization requirements, you complete the rest of the activities of the optimization phase.
- Read about how to get started with your migration to Google Cloud.
- Learn how to prepare a GKE environment for production.
- Understand how to harden your cluster's security and read the GKE security overview.
- Automatically bootstrap GKE nodes with DaemonSets.
- Explore reference architectures, diagrams, tutorials, and best practices about Google Cloud. Take a look at our Cloud Architecture Center.