Migrate containers to Google Cloud: Migrate Kubernetes to GKE

Last reviewed 2023-05-22 UTC

This document helps you plan, design, and implement your migration from a self-managed Kubernetes environment to Google Kubernetes Engine (GKE). If done incorrectly, moving apps from one environment to another can be a challenging task, so you need to plan and execute your migration carefully.

This document is part of a multi-part series about migrating to Google Cloud. If you're interested in an overview of the series, see Migrate to Google Cloud: Choose your migration path.

This document is part of a series that discusses migrating containers to Google Cloud:

Migrate containers to Google Cloud: Migrating Kubernetes to GKE (this document)
Migrate containers to Google Cloud: Migrate to a new GKE environment
Migrate containers to Google Cloud: Migrate from OpenShift to GKE Enterprise

This document is useful if you're planning to migrate from a self-managed Kubernetes environment to GKE. Your environment might be running in an on-premises environment, in a private hosting environment, or in another cloud provider. This document is also useful if you're evaluating the opportunity to migrate and want to explore what it might look like.

By using GKE, you get the following benefits:

You don't have to manage control plane (master) nodes.
You can use Google expertise for security, networking, Kubernetes upgrades, and node auto-provisioning.
You can automatically scale your clusters by adding nodes or by tuning the CPU and memory request limits for Pods.

The following diagram illustrates the path of your migration journey.

Migration path with four phases.

During each migration step, you follow the phases defined in Migrate to Google Cloud: Get started:

Assessing and discovering your workloads.
Planning and building a foundation.
Deploying your workloads.
Optimizing your environment.

Assess your environment

In the assessment phase, you determine the requirements and dependencies to migrate your self-managed Kubernetes environment to GKE:

Build a comprehensive inventory of your apps.
Catalog your apps according to their properties and dependencies.
Train and educate your teams on Google Cloud.
Build an experiment and proof of concept on Google Cloud.
Calculate the total cost of ownership (TCO) of the target environment.
Choose the workloads that you want to migrate first.

The following sections rely on Migrate to Google Cloud: Assess and discover your workloads.

Build your inventories

To scope your migration, you must understand your current Kubernetes environment. You first gather information about your clusters, and then you focus on your workloads deployed in those clusters and the workloads' dependencies. At the end of the assessment phase, you have two inventories: one for your clusters, and one for the workloads deployed in those clusters.

To build the inventory of your clusters, consider the following for each cluster:

Number and type of nodes. When you know how many nodes and the characteristics of each node that you have in your current environment, you size your clusters when you move to GKE. The nodes in your new environment might run on a different hardware-architecture generation than the ones you use in your environment. The performance of each architecture generation is different, so the number of nodes you need in your new environment might be different from your environment. Evaluate any type of hardware that you're using in your nodes, such as high-performance storage devices, GPUs, and TPUs.
Internal or external cluster. Evaluate which actors, either internal to your environment or external, that each cluster is exposed to. To support your use cases, this evaluation includes the workloads running in the cluster, and the interfaces that interact with your clusters.
Multi-tenancy. If you're managing multi-tenant clusters in your environment, assess if it works in your new Google Cloud environment. Now is a good time to evaluate how to improve your multi-tenant clusters because your multi-tenancy strategy influences how you build your foundation on Google Cloud.
Kubernetes version. Gather information about the Kubernetes version of your clusters to assess if there is a mismatch between those versions and the ones available in GKE. If you're running an older or a recently released version, you might be using features that are unavailable in GKE. The features might be deprecated, or the Kubernetes version that ships them is not yet available in GKE.
Kubernetes upgrade cycle. To maintain a reliable environment, understand how you're handling Kubernetes upgrades and how your upgrade cycle relates to GKE upgrades.
Node pools. If you're using any form of node grouping, you might want to consider how these groupings map to the concept of node pools in GKE because your grouping criteria might not be suitable for GKE.
Node initialization. Assess how you initialize each node before marking it as available to run your workloads so you can port those initialization procedures over to GKE.

The following items that you assess in your inventory focus on the security of your infrastructure and Kubernetes clusters:

Namespaces. If you use Kubernetes Namespaces in your clusters to logically separate resources, assess which resources are in each Namespace, and understand why you created this separation. For example, you might be using Namespaces as part of your multi-tenancy strategy. You might have workloads deployed in Namespaces reserved for Kubernetes system components, and you might not have as much control in GKE.
Role-based access control (RBAC). If you use RBAC authorization in your clusters, list a description of all ClusterRoles and ClusterRoleBindings that you configured in your clusters.
Network policies. List all network policies that you configured in your clusters, and understand how network policies work in GKE.
Pod security contexts. Capture information about the Pod security contexts that you configured in your clusters and learn how they work in GKE.
Service accounts. If any process in your cluster is interacting with the Kubernetes API server, capture information about the service accounts that they're using.

After you complete the Kubernetes clusters inventory and assess the security of your environment, build the inventory of the workloads deployed in those clusters. When evaluating your workloads, gather information about the following aspects:

Pods and controllers. To size the clusters in your new environment, assess how many instances of each workload you have deployed, and if you're using Resource quotas and compute resource consumption limits. Gather information about the workloads that are running on the control plane nodes of each cluster and the controllers that each workload uses. For example, how many Deployments are you using? How many DaemonSets are you using?
Jobs and CronJobs. Your clusters and workloads might need to run Jobs or CronJobs as part of their initialization or operation procedures. Assess how many instances of Jobs and CronJobs you have deployed, and the responsibilities and completion criteria for each instance.
Kubernetes Autoscalers. To migrate your autoscaling policies in the new environment, learn how the Horizontal Pod Autoscaler, the Vertical Pod Autoscaler, and the Multidimensional Pod Autoscaler work on GKE.
Stateless and stateful workloads. Stateless workloads don't store data or state in the cluster or to persistent storage. Stateful applications save data for later use. For each workload, assess which components are stateless and which are stateful, because migrating stateful workloads is typically harder than migrating stateless ones.
Kubernetes features. From the cluster inventory, you know which Kubernetes version each cluster runs. Review the release notes of each Kubernetes version to know which features it ships and which features it deprecates. Then assess your workloads against the Kubernetes features that you need. The goal of this task is to know whether you're using deprecated features or features that are not yet available in GKE. If you find any unavailable features, migrate away from deprecated features and adopt the new ones when they're available in GKE.
Storage. For stateful workloads, assess if they use PersistenceVolumeClaims. List any storage requirements, such as size and access mode, and how these PersistenceVolumeClaims map to PersistenceVolumes. To account for future growth, assess if you need to expand any PersistenceVolumeClaim.
Configuration and secret injection. To avoid rebuilding your deployable artifacts every time there is a change in the configuration of your environment, inject configuration and secrets into Pods using ConfigMaps and Secrets. For each workload, assess which ConfigMaps and Secrets that workload is using, and how you're populating those objects.
Dependencies. Your workloads probably don't work in isolation. They might have dependencies, either internal to the cluster, or from external systems. For each workload, capture the dependencies, and if your workloads have any tolerance for when the dependencies are unavailable. For example, common dependencies include distributed file systems, databases, secret distribution platforms, identity and access management systems, service discovery mechanisms, and any other external systems.
Kubernetes Services. To expose your workloads to internal and external clients, use Services. For each Service, you need to know its type. For externally exposed services, assess how that service interacts with the rest of your infrastructure. For example, how is your infrastructure supporting LoadBalancer services and Ingress objects? Which Ingress controllers did you deploy in your clusters?
Service mesh. If you're using a service mesh in your environment, you assess how it's configured. You also need to know how many clusters it spans, which services are part of the mesh, and how you modify the topology of the mesh. For example, are you using an automatic sidecar injection to automatically add sidecars to Kubernetes pods?
Taints and tolerations and affinity and anti-affinity. For each Pod and Node, assess if you configured any Node taints, Pod tolerations, or affinities to customize the scheduling of Pods in your Kubernetes clusters. These properties might also give you insights about possible non-homogeneous Node or Pod configurations, and might mean that either the Pods, the Nodes, or both need to be assessed with special focus and care. For example, if you configured a particular set of Pods to be scheduled only on certain Nodes in your Kubernetes cluster, it might mean that the Pods need specialized resources that are available only on those Nodes.

After you assess your clusters and their workloads, evaluate the rest of the supporting services and aspects in your infrastructure, such as the following:

StorageClasses and PersistentVolumes. Assess how your infrastructure is backing PersistentVolumeClaims by listing StorageClasses for dynamic provisioning, and statically provisioned PersistentVolumes. For each PersistentVolume, consider the following: capacity, volume mode, access mode, class, reclaim policy, mount options, and node affinity.
VolumeSnapshots and VolumeSnapshotContents. For each PersistentVolume, assess if you configured any VolumeSnapshot, and if you need to migrate any existing VolumeSnapshotContents.
Container Storage Interface (CSI) drivers. If deployed in your clusters, assess if these drivers are compatible with GKE, and if you need to adapt the configuration of your volumes to work with CSI drivers that are compatible with GKE.
Data storage. If you depend on external systems to provision PersistentVolumes, provide a way for the workloads in your GKE environment to use those systems. Data locality has an impact on the performance of stateful workloads, because the latency between your external systems and your GKE environment is proportional to the distance between them. For each external data storage system, consider its type, such as block volumes, file storage, or object storage, and any performance and availability requirements that it needs to satisfy.
Logging, monitoring, and tracing. Capture information on your monitoring, logging, and tracing systems. You can integrate your systems with the Google Cloud Observability, or you can use Google Cloud Observability as your only monitoring, logging, and tracing tool. For example, you can integrate Google Cloud Observability with other services, set up logging interfaces for your preferred programming languages, and use the Cloud Logging agent on your VMs. GKE integrates with Google Cloud Observability and Cloud Audit Logs. You can also customize Cloud Logging logs for GKE with Fluentd and then process logs at scale using Dataflow.
Custom resources and Kubernetes add-ons. Collect information about any custom Kubernetes resources and any Kubernetes add-ons that you might have deployed in your clusters, because they might not work in GKE, or you might need to modify them. For example, if a custom resource interacts with an external system, you assess if that's applicable to your Google Cloud environment.

Complete the assessment

After building the inventories related to your Kubernetes clusters and workloads, complete the rest of the activities of the assessment phase in Migrate to Google Cloud: Assess and discover your workloads.

Plan and build your foundation

In the planning and building phase, you provision and configure the cloud infrastructure and services that support your workloads on Google Cloud:

Build a resource hierarchy.
Configure identity and access management.
Set up billing.
Set up network connectivity.
Harden your security.
Set up monitoring and alerting.

If you've already adopted infrastructure-as-code to manage the workloads in your Kubernetes environment, you can apply the same process to your Google Cloud environment. You analyze your Kubernetes descriptors because some Google Cloud resources that GKE automatically provisions for you are configurable by using Kubernetes labels and annotations. For example, you can provision an internal load balancer instead of an external one by adding an annotation to a LoadBalancer Service.

The following sections rely on Migrate to Google Cloud: Build your foundation.

Build a resource hierarchy

To design an efficient resource hierarchy, consider how your business and organizational structures map to Google Cloud as detailed in Migrate to Google Cloud: Build your foundation.

For example, if you need a multi-tenant environment on GKE, you can choose between the following options:

Creating one Google Cloud project for each tenant.
Sharing one project among different tenants, and provisioning multiple GKE clusters.
Using Kubernetes namespaces.

Your choice depends on your isolation, complexity, and scalability needs. For example, having one project per tenant isolates the tenants from one another, but the resource hierarchy becomes more complex to manage due to the high number of projects. However, although managing Kubernetes Namespaces is relatively easier than a complex resource hierarchy, this option doesn't guarantee as much isolation. For example, the control plane might be shared between tenants.

Configure identity and access management

Identity and Access Management provides the tools to centrally configure fine-grained access control to cloud resources. For more information, see Identity and Access Management.

Review how Kubernetes RBAC interacts with identity and access management in Google Cloud, and configure the RBAC according to your requirements that you gathered in the assessment phase.

Set up billing

Before provisioning any Google Cloud resources, configure Cloud Billing and understand the GKE pricing model. For more information, see billing.

Set up network connectivity

Network configuration is a fundamental aspect of your environment. Assess the GKE network model and the connectivity requirements of your workloads. Then, you can start planning your network configuration. For more information, see connectivity and networking.

Harden your security

Understanding the differences between your environment's security model and Google Cloud's model and how to harden the security of your GKE clusters are crucial to protect your critical assets. For more information, see security.

Set up monitoring and alerting

Having a clear picture of how your infrastructure and workloads are performing is key to finding areas of improvement. GKE has deep integrations with Google Cloud Observability, so you get logging and monitoring information about your GKE clusters and workloads inside those clusters. For more information, see monitoring and alerting.

Deploy your workloads

In the deployment phase, you do the following:

Provision and configure your GKE environment.
Configure your GKE clusters.
Migrate data from your source environment to Google Cloud.
Deploy your workloads in your GKE environment.
Validate your workloads.
Expose workloads running on GKE.
Shift traffic from the source environment to the GKE environment.
Decommission the source environment.

Provision and configure your runtime platform and environments

Before moving any workload to your new Google Cloud environment, you provision the GKE clusters.

After the assessment phase, you now know how to provision the GKE clusters in your new Google Cloud environment to meet your needs. You can provision the following:

The number of clusters, the number of nodes per cluster, the types of clusters, the configuration of each cluster and each node, and the scalability plans of each cluster.
The mode of operation of each cluster. GKE offers two modes of operation for clusters: GKE Autopilot and GKE Standard.
The number of private clusters.
The choice between VPC-native or router-based networking.
The Kubernetes versions and release channels that you need in your GKE clusters.
The node pools to logically group the nodes in your GKE clusters, and if you need to automatically create node pools with node auto-provisioning.
The initialization procedures that you can port from your environment to the GKE environment and new procedures that you can implement. For example, you can automatically bootstrap GKE nodes by implementing one or multiple, eventually privileged, initialization procedures for each node or node pool in your clusters.
The scalability plans for each cluster
The additional GKE features that you need, such as Anthos Service Mesh, and GKE add-ons, such as Backup for GKE.

For more information about provisioning GKE clusters, see:

Configure your clusters

After you provision your GKE clusters and before deploying any workload or migrating data, configure namespaces, RBAC, network policies, Resource quotas, and other Kubernetes and GKE objects for each GKE cluster.

To configure Kubernetes and GKE objects in your GKE clusters, we recommend that you:

Ensure that you have the necessary credentials and permissions to access both the clusters in your source environment, and in your GKE environment.
Assess if the objects in the Kubernetes clusters your source environment are compatible with GKE, and how the implementations that back these objects differ from the source environment and GKE.
Refactor any incompatible object to make it compatible with GKE, or retire it.
Migrate these objects to your GKE clusters.
Configure any additional objects that your need in your GKE clusters.

Migrate cluster configuration

To migrate the configuration of your Kubernetes clusters from the source environment to your GKE clusters, you might use the following approach:

If you adopted infrastructure-as-code processes to configure objects in the Kubernetes clusters in your source environment, you can:
1. Migrate objects that are compatible with GKE with only minor metadata changes, such as object names, location, or namespace by using Kubernetes tools (kubectl), or managed services (Config Sync).
2. Refactor or retire objects that aren't compatible with GKE.
If you didn't adopt infrastructure-as-code processes, you can:
- Migrate the configuration of Kubernetes objects from your source environment to your GKE environment using third-party tools, such as Crane, and Velero.

Migrate data

To migrate data that your stateful workloads need from your source environment to your GKE environment, we recommend that you design a data migration plan by following the guidance in Migrate to Google Cloud: Transfer large datasets.

You provision all necessary storage infrastructure before moving your data. If you're using any StorageClass provisioners, you configure them in the new clusters.

For more information about the data storage options that you have on GKE, see storage configuration. For example, you can use Compute Engine persistent disks, either zonal or replicated across a region, or you can use Filestore.

After provisioning StorageClasses, you provision all the necessary PersistentVolumes to store the data to migrate. Then migrate the data from the source environment to these PersistentVolumes. The specifics of this data migration depend on the characteristics of the source environment. For example, you can:

Provision a Compute Engine instance.
Attach a persistent disk to the Compute Engine instance.
Copy data from the source environment to the persistent disk.
Shut down the Compute Engine instance.
Detach the persistent disk from the Compute Engine instance.
Configure the persistent disk as a GKE PersistentVolume.
Decommission the Compute Engine instance.

For more information about using Compute Engine persistent disks as GKE PersistentVolumes, see Using pre-existing persistent disks as PersistentVolumes.

Deploy your workloads

To deploy your workloads, we recommend one of the following approaches:

Implement a deployment process on Google Cloud.
Refactor your existing deployment processes to deploy workloads in your GKE environment.

The deployment phase is also a chance to modernize your deployment processes and your workloads. For example, if you're using any Pods in your environment, consider migrating those workloads to Kubernetes Deployments.

For more information about refactoring your deployment processes, see Migrate to Google Cloud: Migrate from manual deployments to containers and automation. It contains guidance to migrate away from manual deployments to container orchestration tools and automation.

When your deployment processes are ready, you can deploy your workloads to GKE.

Implement a deployment process on Google Cloud

To implement your deployment processes on Google Cloud, use the scalability, the managed operations, and the security-by-design of Google Cloud products.

For more information about implementing your deployment processes on Google Cloud, see:

Refactor your existing deployment processes

Although not strictly necessary for a successful outcome, you can also refactor your deployment processes during the migration. For example, you can modernize and automated your existing deployment processes and implement them on Google Cloud.

Migrating your deployment processes to Google Cloud at the same time as you're migrating workloads, can be complex, and can increase the risk of failing the migration. For particularly complex migrations, you can also consider migrating your deployment process in a second time, and continue using your current ones to deploy workloads in your GKE environment. This approach helps you reduce the complexity of the migration. By continuing to use your existing deployment processes, you can simplify the migration process.

Validate your workloads

After you deploy workloads in your GKE environment, but before you expose these workloads to your users, we recommend that you perform extensive validation and testing. This testing can help you verify that your workloads are behaving as expected. For example, you may:

Perform integration testing, load testing, compliance testing, reliability testing, and other verification procedures that help you ensure that your workloads are operating within their expected parameters, and according to their specifications.
Examine logs, metrics, and error reports in Google Cloud Observability to identify any potential issues, and to spot trends to anticipate problems before they occur.

For more information about workload validation, see Testing for reliability.

Expose your workloads

Once you complete the validation testing of the workloads running in your GKE environment, expose your workloads to make them reachable.

To expose workloads running in your GKE environment, you can use Kubernetes Services, and a service mesh.

For more information about how GKE supports Kubernetes services, see Services.

For more information about exposing workloads running in GKE, see:

Shift traffic to your Google Cloud environment

After you have verified that the workloads are running in your GKE environment, and after you have exposed them to clients, you shift traffic from your source environment to your GKE environment. To help you avoid big-scale migrations and all the related risks, we recommend that you gradually shift traffic from your source environment to your GKE.

Depending on how you designed your GKE environment, you have several options to implement a load balancing mechanism that gradually shifts traffic from your source environment to your target environment. For example, you may implement a DNS resolution policy that resolves DNS records according to some policy to resolve a certain percentage of requests to IP addresses belonging to your GKE environment. Or you can implement a load balancing mechanism using virtual IP addresses and network load balancers.

After you start gradually shifting traffic to your GKE environment, we recommend that you monitor how your workloads behave as their loads increase.

Finally, you perform a cutover, which happens when you shift all the traffic from your source environment to your GKE environment.

For more information about load balancing, see Load balancing at the frontend

Decommission the source environment

After the workloads in your GKE environment are serving requests correctly, you decommission your source environment.

Before you start decommissioning resources in your source environment, we recommend that you do the following:

Back up any data to help you restore resources in your source environment.
Notify your users before decommissioning the environment.

To decommission your source environment, do the following:

Decommission the workloads running in the clusters in your source environment.
Delete the clusters in your source environment.
Delete the resources associated with these clusters, such as security groups, load balancers, and virtual networks.

To avoid leaving orphaned resources, the order in which you decommission the resources in your source environment is important. For example, certain providers require that you decommission Kubernetes Services that lead to the creation of load balancers before being able to decommission the virtual networks containing those load balancers.

Optimize your environment

Optimization is the last phase of your migration. In this phase, you make your environment more efficient than it was before. In this phase, you execute multiple iterations of a repeatable loop until your environment meets your optimization requirements. The steps of this repeatable loop are as follows:

Assessing your current environment, teams, and optimization loop.
Establishing your optimization requirements and goals.
Optimizing your environment and your teams.
Tuning the optimization loop.

The following sections rely on Migrate to Google Cloud: Optimize your environment.

Assess your current environment, teams, and optimization loop

While the first assessment focuses on the migration from your environment to GKE, this assessment is tailored for the optimization phase.

Establish your optimization requirements

Review the following optimization requirements for your GKE environment:

Implement advanced deployment processes. Processes like canary deployments or blue/green deployments give you added flexibility and can increase the reliability of your environment, extend testing, and reduce the impact of any issue for your users.
Configure a service mesh. By introducing a service mesh to your environment, you use features like observability, traffic management, and mutual authentication for your services, and reduce the strain on your DevOps teams. You can deploy a multi-cluster service mesh to better segment your workloads or an expanded service mesh to support your migration to the new environment.
Set up automatic scaling. You have different, complementary options to automatically scale your GKE environment. You can automatically scale your clusters and the workloads inside each cluster. By configuring the cluster autoscaler, you can automatically resize a GKE cluster based on the demands of your workloads, by adding or removing worker nodes to the cluster. If you want to automatically scale your workloads, you adjust the CPU and memory consumption requests and limits with the vertical Pod autoscaler. When you use the autoscaler, you don't have to think about the values to specify for each container's CPU and memory requests. You can also export the metrics provided by the autoscaler to right-size GKE workloads at scale.
Reduce costs with preemptible virtual machines (VMs). If some of your workloads are tolerant to runtime environments with no availability guarantees, consider deploying those workloads in a node pool composed of preemptible VMs. Preemptible VMs are priced lower than standard Compute Engine VMs, so you can reduce the costs of your clusters.
Integrate GKE with other products. Some Google Cloud products can integrate with GKE to harden the security of your environment. For example, you can analyze containers for vulnerabilities or use managed base images in Container Registry.
Design your GKE clusters to be fungible. By considering your clusters as fungible and by automating their provisioning and configuration, you can streamline and generalize the operational processes to maintain them and also simplify future migrations and GKE cluster upgrades. For example, if you need to upgrade a fungible GKE cluster to a new GKE version, you can automatically provision and configure a new, upgraded cluster, automatically deploy workloads in the new cluster, and decommission the old, outdated GKE cluster.
Architect a multi-cluster environment. By implementing a multi-cluster environment on GKE, you:
- Reduce the chances of introducing single points of failure in your architecture.
- Benefit from the greater flexibility of the possibility of testing configuration changes on a subset of your GKE clusters.
- Distribute workloads across GKE clusters.

Although you can pursue some of these optimization requirements in a Kubernetes environment, it is easier in GKE because you don't have to spend the effort to keep the cluster running. Instead, you can focus on the optimization itself.

Complete the optimization

After populating the list of your optimization requirements, you complete the rest of the activities of the optimization phase.

What's next

Read about how to get started with your migration to Google Cloud.
Understand how to harden your cluster's security and read the GKE security overview.
Automatically bootstrap GKE nodes with DaemonSets.
Find help for your migration to Google Cloud.
Explore reference architectures, diagrams, and best practices about Google Cloud. Take a look at our Cloud Architecture Center.