This document helps you plan, design, and implement your migration from a self-managed Kubernetes environment to Google Kubernetes Engine (GKE). If done incorrectly, moving apps from one environment to another can be a challenging task, so you need to plan and execute your migration carefully.
This document is useful if you're planning to migrate from a self-managed Kubernetes environment to GKE. Your environment might be running in an on-premises environment, in a private hosting environment, or in another cloud provider. This document is also useful if you're evaluating the opportunity to migrate and want to explore what it might look like.
GKE is a Google-managed Kubernetes service that you can use to deploy and operate containerized applications at scale using Google's infrastructure, and provides features that help you manage your Kubernetes environment, such as:
- Two editions: GKE Standard and GKE Enterprise. With GKE Standard, you get access to a standard tier of core features. With GKE Enterprise, you get access to all the capabilities of GKE. For more information, see GKE editions.
- Two modes of operation: Standard and Autopilot. With Standard, you manage the underlying infrastructure and the configuration of each node in your GKE cluster. With Autopilot, GKE manages the underlying infrastructure such as node configuration, autoscaling, auto-upgrades, baseline security and network configuration. For more information about GKE modes of operation, see Choose a GKE mode of operation.
- Industry-unique service level agreement for Pods when using Autopilot in multiple zones.
- Automated node pool creation and deletion with node auto-provisioning.
- Google-managed multi-cluster networking to help you design and implement highly available, distributed architectures for your workloads.
For more information about GKE, see GKE overview.
For this migration to Google Cloud, we recommend that you follow the migration framework described in Migrate to Google Cloud: Get started.
The following diagram illustrates the path of your migration journey.
You might migrate from your source environment to Google Cloud in a series of iterations—for example, you might migrate some workloads first and others later. For each separate migration iteration, you follow the phases of the general migration framework:
- Assess and discover your workloads and data.
- Plan and build a foundation on Google Cloud.
- Migrate your workloads and data to Google Cloud.
- Optimize your Google Cloud environment.
For more information about the phases of this framework, see Migrate to Google Cloud: Get started.
To design an effective migration plan, we recommend that you validate each step of the plan, and ensure that you have a rollback strategy. To help you validate your migration plan, see Migrate to Google Cloud: Best practices for validating a migration plan.
Assess your environment
In the assessment phase, you determine the requirements and dependencies to migrate your source environment to Google Cloud.
The assessment phase is crucial for the success of your migration. You need to gain deep knowledge about the workloads you want to migrate, their requirements, their dependencies, and about your current environment. You need to understand your starting point to successfully plan and execute a Google Cloud migration.
The assessment phase consists of the following tasks:
- Build a comprehensive inventory of your workloads.
- Catalog your workloads according to their properties and dependencies.
- Train and educate your teams on Google Cloud.
- Build experiments and proofs of concept on Google Cloud.
- Calculate the total cost of ownership (TCO) of the target environment.
- Choose the migration strategy for your workloads.
- Choose your migration tools.
- Define the migration plan and timeline.
- Validate your migration plan.
For more information about the assessment phase and these tasks, see Migrate to Google Cloud: Assess and discover your workloads. The following sections are based on information in that document.
Build your inventories
To scope your migration, you create two inventories:
- The inventory of your clusters.
- The inventory of your workloads that are deployed in those clusters.
After you build these inventories, you:
- Assess your deployment and operational processes for your source environment.
- Assess supporting services and external dependencies.
Build the inventory of your clusters
To build the inventory of your clusters, consider the following for each cluster:
- Number and type of nodes. When you know how many nodes and the characteristics of each node that you have in your current environment, you size your clusters when you move to GKE. The nodes in your new environment might run on a different hardware architecture or generation than the ones you use in your environment. The performance of each architecture and generation is different, so the number of nodes you need in your new environment might be different from your environment. Evaluate any type of hardware that you're using in your nodes, such as high-performance storage devices, GPUs, and TPUs. Assess which operating system image that you're using on your nodes.
- Internal or external cluster. Evaluate which actors, either internal to your environment or external, that each cluster is exposed to. To support your use cases, this evaluation includes the workloads running in the cluster, and the interfaces that interact with your clusters.
- Multi-tenancy. If you're managing multi-tenant clusters in your environment, assess if it works in your new Google Cloud environment. Now is a good time to evaluate how to improve your multi-tenant clusters because your multi-tenancy strategy influences how you build your foundation on Google Cloud.
- Kubernetes version. Gather information about the Kubernetes version of your clusters to assess if there is a mismatch between those versions and the ones available in GKE. If you're running an older or a recently released Kubernetes version, you might be using features that are unavailable in GKE. The features might be deprecated, or the Kubernetes version that ships them is not yet available in GKE.
- Kubernetes upgrade cycle. To maintain a reliable environment, understand how you're handling Kubernetes upgrades and how your upgrade cycle relates to GKE upgrades.
- Node pools. If you're using any form of node grouping, you might want to consider how these groupings map to the concept of node pools in GKE because your grouping criteria might not be suitable for GKE.
- Node initialization. Assess how you initialize each node before marking it as available to run your workloads so you can port those initialization procedures over to GKE.
- Network configuration. Assess the network configuration of your clusters, their IP address allocation, how you configured their networking plugins, how you configured their DNS servers and DNS service providers, if you configured any form of NAT or SNAT for these clusters, and whether they are part of a multi-cluster environment.
- Compliance: Assess any compliance and regulatory requirements that your clusters are required to satisfy, and whether you're meeting these requirements.
- Quotas and limits. Assess how you configured quotas and limits for your clusters. For example, how many Pods can each node run? How many nodes can a cluster have?
- Labels and tags. Assess any metadata that you applied to clusters, node pools, and nodes, and how you're using them. For example, you might be generating reports with detailed, label-based cost attribution.
The following items that you assess in your inventory focus on the security of your infrastructure and Kubernetes clusters:
- Namespaces. If you use Kubernetes Namespaces in your clusters to logically separate resources, assess which resources are in each Namespace, and understand why you created this separation. For example, you might be using Namespaces as part of your multi-tenancy strategy. You might have workloads deployed in Namespaces reserved for Kubernetes system components, and you might not have as much control in GKE.
- Role-based access control (RBAC). If you use RBAC authorization in your clusters, list a description of all ClusterRoles and ClusterRoleBindings that you configured in your clusters.
- Network policies. List all network policies that you configured in your clusters, and understand how network policies work in GKE.
- Pod security contexts. Capture information about the Pod security contexts that you configured in your clusters and learn how they work in GKE.
- Service accounts. If any process in your cluster is interacting with the Kubernetes API server, capture information about the service accounts that they're using.
When you build the inventory of your Kubernetes clusters, you might find that some of the clusters need to be decommissioned as part of your migration. Make sure that your migration plan includes retiring these resources.
Build the inventory of your Kubernetes workloads
After you complete the Kubernetes clusters inventory and assess the security of your environment, build the inventory of the workloads deployed in those clusters. When evaluating your workloads, gather information about the following aspects:
- Pods and controllers. To size the clusters in your new environment, assess how many instances of each workload you have deployed, and if you're using Resource quotas and compute resource consumption limits. Gather information about the workloads that are running on the control plane nodes of each cluster and the controllers that each workload uses. For example, how many Deployments are you using? How many DaemonSets are you using?
- Jobs and CronJobs. Your clusters and workloads might need to run Jobs or CronJobs as part of their initialization or operation procedures. Assess how many instances of Jobs and CronJobs you have deployed, and the responsibilities and completion criteria for each instance.
- Kubernetes Autoscalers. To migrate your autoscaling policies in the new environment, learn how the Horizontal Pod Autoscaler and the Vertical Pod Autoscaler, work on GKE.
- Stateless and stateful workloads. Stateless workloads don't store data or state in the cluster or to persistent storage. Stateful applications save data for later use. For each workload, assess which components are stateless and which are stateful, because migrating stateful workloads is typically harder than migrating stateless ones.
- Kubernetes features. From the cluster inventory, you know which Kubernetes version each cluster runs. Review the release notes of each Kubernetes version to know which features it ships and which features it deprecates. Then assess your workloads against the Kubernetes features that you need. The goal of this task is to know whether you're using deprecated features or features that are not yet available in GKE. If you find any unavailable features, migrate away from deprecated features and adopt the new ones when they're available in GKE.
- Storage. For stateful workloads, assess if they use PersistenceVolumeClaims. List any storage requirements, such as size and access mode, and how these PersistenceVolumeClaims map to PersistenceVolumes. To account for future growth, assess if you need to expand any PersistenceVolumeClaim.
- Configuration and secret injection. To avoid rebuilding your deployable artifacts every time there is a change in the configuration of your environment, inject configuration and secrets into Pods using ConfigMaps and Secrets. For each workload, assess which ConfigMaps and Secrets that workload is using, and how you're populating those objects.
- Dependencies. Your workloads probably don't work in isolation. They might have dependencies, either internal to the cluster, or from external systems. For each workload, capture the dependencies, and if your workloads have any tolerance for when the dependencies are unavailable. For example, common dependencies include distributed file systems, databases, secret distribution platforms, identity and access management systems, service discovery mechanisms, and any other external systems.
- Kubernetes Services. To expose your workloads to internal and external clients, use Services. For each Service, you need to know its type. For externally exposed services, assess how that service interacts with the rest of your infrastructure. For example, how is your infrastructure supporting LoadBalancer services, Gateway objects, and Ingress objects? Which Ingress controllers did you deploy in your clusters?
- Service mesh. If you're using a service mesh in your environment, you assess how it's configured. You also need to know how many clusters it spans, which services are part of the mesh, and how you modify the topology of the mesh.
- Taints and tolerations and affinity and anti-affinity. For each Pod and Node, assess if you configured any Node taints, Pod tolerations, or affinities to customize the scheduling of Pods in your Kubernetes clusters. These properties might also give you insights about possible non-homogeneous Node or Pod configurations, and might mean that either the Pods, the Nodes, or both need to be assessed with special focus and care. For example, if you configured a particular set of Pods to be scheduled only on certain Nodes in your Kubernetes cluster, it might mean that the Pods need specialized resources that are available only on those Nodes.
- Authentication: Assess how your workloads authenticate against resources in your cluster, and against external resources.
Assess supporting services and external dependencies
After you assess your clusters and their workloads, evaluate the rest of the supporting services and aspects in your infrastructure, such as the following:
- StorageClasses and PersistentVolumes. Assess how your infrastructure is backing PersistentVolumeClaims by listing StorageClasses for dynamic provisioning, and statically provisioned PersistentVolumes. For each PersistentVolume, consider the following: capacity, volume mode, access mode, class, reclaim policy, mount options, and node affinity.
- VolumeSnapshots and VolumeSnapshotContents. For each PersistentVolume, assess if you configured any VolumeSnapshot, and if you need to migrate any existing VolumeSnapshotContents.
- Container Storage Interface (CSI) drivers. If deployed in your clusters, assess if these drivers are compatible with GKE, and if you need to adapt the configuration of your volumes to work with CSI drivers that are compatible with GKE.
- Data storage. If you depend on external systems to provision PersistentVolumes, provide a way for the workloads in your GKE environment to use those systems. Data locality has an impact on the performance of stateful workloads, because the latency between your external systems and your GKE environment is proportional to the distance between them. For each external data storage system, consider its type, such as block volumes, file storage, or object storage, and any performance and availability requirements that it needs to satisfy.
- Custom resources and Kubernetes add-ons. Collect information about any custom Kubernetes resources and any Kubernetes add-ons that you might have deployed in your clusters, because they might not work in GKE, or you might need to modify them. For example, if a custom resource interacts with an external system, you assess if that's applicable to your Google Cloud environment.
- Backup. Assess how you're backing up the configuration of your clusters and stateful workload data in your source environment.
Assess your deployment and operational processes
It's important to have a clear understanding of how your deployment and operational processes work. These processes are a fundamental part of the practices that prepare and maintain your production environment and the workloads that run there.
Your deployment and operational processes might build the artifacts that your workloads need to function. Therefore, you should gather information about each artifact type. For example, an artifact can be an operating system package, an application deployment package, an operating system image, a container image, or something else.
In addition to the artifact type, consider how you complete the following tasks:
- Develop your workloads. Assess the processes that development teams have in place to build your workloads. For example, how are your development teams designing, coding, and testing your workloads?
- Generate the artifacts that you deploy in your source environment. To deploy your workloads in your source environment, you might be generating deployable artifacts, such as container images or operating system images, or you might be customizing existing artifacts, such as third-party operating system images by installing and configuring software. Gathering information about how you're generating these artifacts helps you to ensure that the generated artifacts are suitable for deployment in Google Cloud.
Store the artifacts. If you produce artifacts that you store in an artifact registry in your source environment, you need to make the artifacts available in your Google Cloud environment. You can do so by employing strategies like the following:
- Establish a communication channel between the environments: Make the artifacts in your source environment reachable from the target Google Cloud environment.
- Refactor the artifact build process: Complete a minor refactor of your source environment so that you can store artifacts in both the source environment and the target environment. This approach supports your migration by building infrastructure like an artifact repository before you have to implement artifact build processes in the target Google Cloud environment. You can implement this approach directly, or you can build on the previous approach of establishing a communication channel first.
Having artifacts available in both the source and target environments lets you focus on the migration without having to implement artifact build processes in the target Google Cloud environment as part of the migration.
Scan and sign code. As part of your artifact build processes, you might be using code scanning to help you guard against common vulnerabilities and unintended network exposure, and code signing to help you ensure that only trusted code runs in your environments.
Deploy artifacts in your source environment. After you generate deployable artifacts, you might be deploying them in your source environment. We recommend that you assess each deployment process. The assessment helps ensure that your deployment processes are compatible with Google Cloud. It also helps you to understand the effort that will be necessary to eventually refactor the processes. For example, if your deployment processes work with your source environment only, you might need to refactor them to target your Google Cloud environment.
Inject runtime configuration. You might be injecting runtime configuration for specific clusters, runtime environments, or workload deployments. The configuration might initialize environment variables and other configuration values such as secrets, credentials, and keys. To help ensure that your runtime configuration injection processes work on Google Cloud, we recommend that you assess how you're configuring the workloads that run in your source environment.
Logging, monitoring, and profiling. Assess the logging, monitoring, and profiling processes that you have in place to monitor the health of your source environment, the metrics of interest, and how you're consuming data provided by these processes.
Authentication. Assess how you're authenticating against your source environment.
Provision and configure your resources. To prepare your source environment, you might have designed and implemented processes that provision and configure resources. For example, you might be using Terraform along with configuration management tools to provision and configure resources in your source environment.
Plan and build your foundation
In the plan and build phase, you provision and configure the infrastructure to do the following:
- Support your workloads in your Google Cloud environment.
- Connect your source environment and your Google Cloud environment to complete the migration.
The plan and build phase is composed of the following tasks:
- Build a resource hierarchy.
- Configure Google Cloud's Identity and Access Management (IAM).
- Set up billing.
- Set up network connectivity.
- Harden your security.
- Set up logging, monitoring, and alerting.
For more information about each of these tasks, see the Migrate to Google Cloud: Plan and build your foundation.
The following sections integrate the considerations in Migrate to Google Cloud: Plan and build your foundation.
Plan for multi-tenancy
To design an efficient resource hierarchy, consider how your business and organizational structures map to Google Cloud. For example, if you need a multi-tenant environment on GKE, you can choose between the following options:
- Creating one Google Cloud project for each tenant.
- Sharing one project among different tenants, and provisioning multiple GKE clusters.
- Using Kubernetes namespaces.
Your choice depends on your isolation, complexity, and scalability needs. For example, having one project per tenant isolates the tenants from one another, but the resource hierarchy becomes more complex to manage due to the high number of projects. However, although managing Kubernetes Namespaces is relatively easier than a complex resource hierarchy, this option doesn't guarantee as much isolation. For example, the control plane might be shared between tenants. For more information, see Cluster multi-tenancy.
Configure identity and access management
GKE supports multiple options for managing access to resources within your Google Cloud project and its clusters using RBAC. For more information, see Access control.
Configure GKE networking
Network configuration is a fundamental aspect of your environment. Before provisioning and configure any cluster, we recommend that you assess the GKE network model, the best practices for GKE networking, and how to plan IP addresses when migrating to GKE.
Set up monitoring and alerting
Having a clear picture of how your infrastructure and workloads are performing is key to finding areas of improvement. GKE has deep integrations with Google Cloud Observability, so you get logging, monitoring, and profiling information about your GKE clusters and workloads inside those clusters.
Migrate data and deploy your workloads
In the deployment phase, you do the following:
- Provision and configure your GKE environment.
- Configure your GKE clusters.
- Refactor your workloads.
- Refactor deployment and operational processes.
- Migrate data from your source environment to Google Cloud.
- Deploy your workloads in your GKE environment.
- Validate your workloads and GKE environment.
- Expose workloads running on GKE.
- Shift traffic from the source environment to the GKE environment.
- Decommission the source environment.
Provision and configure your Google Cloud environment
Before moving any workload to your new Google Cloud environment, you provision the GKE clusters.
GKE supports enabling certain features on existing clusters, but there might be features that you can only enable at cluster creation time. To help you avoid disruptions and simplify the migration, we recommend that you enable the cluster features that you need at cluster creation time. Otherwise, you might need to destroy and recreate your clusters in case the cluster features you need cannot be enabled after creating a cluster.
After the assessment phase, you now know how to provision the GKE clusters in your new Google Cloud environment to meet your needs. To provision your clusters, consider the following:
- The number of clusters, the number of nodes per cluster, the types of clusters, the configuration of each cluster and each node, and the scalability plans of each cluster.
- The mode of operation of each cluster. GKE offers two modes of operation for clusters: GKE Autopilot and GKE Standard.
- The number of private clusters.
- The choice between VPC-native or router-based networking.
- The Kubernetes versions and release channels that you need in your GKE clusters.
- The node pools to logically group the nodes in your GKE clusters, and if you need to automatically create node pools with node auto-provisioning.
- The initialization procedures that you can port from your environment to the GKE environment and new procedures that you can implement. For example, you can automatically bootstrap GKE nodes by implementing one or multiple, eventually privileged, initialization procedures for each node or node pool in your clusters.
- The scalability plans for each cluster.
- The additional GKE features that you need, such as Cloud Service Mesh, and GKE add-ons, such as Backup for GKE.
For more information about provisioning GKE clusters, see:
- About cluster configuration choices.
- Manage, configure, and deploy GKE clusters.
- Understanding GKE security.
- Harden your cluster's security.
- GKE networking overview.
- Best practices for GKE networking.
- Storage for GKE clusters overview.
Fleet management
When you provision your GKE clusters, you might realize that you need a large number of them to support all the use cases of your environment. For example, you might need to separate production from non-production environments, or separate services across teams or geographies. For more information, see multi-cluster use cases.
As the number of clusters increases, your GKE environment might become harder to operate because managing a large number of clusters poses significant scalability and operational challenges. GKE provides tools and features to help you manage fleets, a logical grouping of Kubernetes clusters. For more information, see Fleet management.
Multi-cluster networking
To help you improve the reliability of your GKE environment, and to distribute your workloads across several GKE clusters, you can use:
- Multi-Cluster Service Discovery, a cross-cluster service discovery and invocation mechanism. Services are discoverable and accessible across GKE clusters. For more information, see Multi-Cluster Service Discovery.
- Multi-cluster gateways, a cross-cluster ingress traffic load balancing mechanism. For more information, see Deploying multi-cluster Gateways.
- Multi-cluster mesh on managed Cloud Service Mesh. For more information, see Set up a multi-cluster mesh.
For more information about migrating from a single-cluster GKE environment to a multi-cluster GKE environment, see Migrate to multi-cluster networking.
Configure your GKE clusters
After you provision your GKE clusters and before deploying any workload or migrating data, you configure namespaces, RBAC, network policies, service accounts, and other Kubernetes and GKE objects for each GKE cluster.
To configure Kubernetes and GKE objects in your GKE clusters, we recommend that you:
- Ensure that you have the necessary credentials and permissions to access both the clusters in your source environment, and in your GKE environment.
- Assess if the objects in the Kubernetes clusters your source environment are compatible with GKE, and how the implementations that back these objects differ from the source environment and GKE.
- Refactor any incompatible object to make it compatible with GKE, or retire it.
- Create these objects to your GKE clusters.
- Configure any additional objects that your need in your GKE clusters.
Config Sync
To help you adopt GitOps best practices to manage the configuration of your GKE clusters as your GKE scales, we recommend that you use Config Sync, a GitOps service to deploy configurations from a source of truth. For example, you can store the configuration of your GKE clusters in a Git repository, and use Config Sync to apply that configuration.
For more information, see Config Sync architecture.
Policy Controller
Policy Controller helps you apply and enforce programmable policies to help ensure that your GKE clusters and workloads run in a secure and compliant manner. As your GKE environment scales, you can use Policy Controller to automatically apply policies, policy bundles, and constraints to all your GKE clusters. For example, you can restrict the repositories from where container images can be pulled from, or you can require each namespace to have at least one label to help you ensure accurate resource consumption tracking.
For more information, see Policy Controller.
Refactor your workloads
A best practice to design containerized workloads is to avoid dependencies on the container orchestration platform. This might not always be possible in practice due to the requirements and the design of your workloads. For example, your workloads might depend on environment-specific features that are available in your source environment only, such as add-ons, extensions, and integrations.
Although you might be able to migrate most workloads as-is to GKE, you might need to spend additional effort to refactor workloads that depend on environment-specific features, in order to minimize these dependencies, eventually switching to alternatives that are available on GKE.
To refactor your workloads before migrating them to GKE, you do the following:
- Review source environment-specific features, such as add-ons, extensions, and integrations.
- Adopt suitable alternative GKE solutions.
- Refactor your workloads.
Review source environment-specific features
If you're using source environment-specific features, and your workloads depend on these features, you need to:
- Find suitable alternatives GKE solutions.
- Refactor your workloads in order to make use of the alternative GKE solutions.
As part of this review, we recommend that you do the following:
- Consider whether you can deprecate any of these source environment-specific features.
- Evaluate how critical a source environment-specific feature is for the success of the migration.
Adopt suitable alternative GKE solutions
After you reviewed your source environment-specific features, and mapped them to suitable GKE alternative solutions, you adopt these solutions in your GKE environment. To reduce the complexity of your migration, we recommend that you do the following:
- Avoid adopting alternative GKE solutions for source environment-specific features that you aim to deprecate.
- Focus on adopting alternative GKE solutions for the most critical source environment-specific features, and plan dedicated migration projects for the rest.
Refactor your workloads
While most of your workloads might work as is in GKE, you might need to refactor some of them, especially if they depended on source environment-specific features for which you adopted alternative GKE solutions.
This refactoring might involve:
- Kubernetes object descriptors, such as Deployments, and Services expressed in YAML format.
- Container image descriptors, such as Dockerfiles and Containerfiles.
- Workloads source code.
To simplify the refactoring effort, we recommend that you focus on applying the least amount of changes that you need to make your workloads suitable for GKE, and critical bug fixes. You can plan other improvements and changes as part of future projects.
Refactor deployment and operational processes
After you refactor your workloads, you refactor your deployment and operational processes to do the following:
- Provision and configure resources in your Google Cloud environment instead of provisioning resources in your source environment.
- Build and configure workloads, and deploy them in your Google Cloud instead of deploying them in your source environment.
You gathered information about these processes during the assessment phase earlier in this process.
The type of refactoring that you need to consider for these processes depends on how you designed and implemented them. The refactoring also depends on what you want the end state to be for each process. For example, consider the following:
- You might have implemented these processes in your source environment and you intend to design and implement similar processes in Google Cloud. For example, you can refactor these processes to use Cloud Build, Cloud Deploy, and Infrastructure Manager.
- You might have implemented these processes in another third-party environment outside your source environment. In this case, you need to refactor these processes to target your Google Cloud environment instead of your source environment.
- A combination of the previous approaches.
Refactoring deployment and operational processes can be complex and can require significant effort. If you try to perform these tasks as part of your workload migration, the workload migration can become more complex, and it can expose you to risks. After you assess your deployment and operational processes, you likely have an understanding of their design and complexity. If you estimate that you require substantial effort to refactor your deployment and operational processes, we recommend that you consider refactoring these processes as part of a separate, dedicated project.
For more information about how to design and implement deployment processes on Google Cloud, see:
- Migrate to Google Cloud: Deploy your workloads
- Migrate to Google Cloud: Migrate from manual deployments to automated, containerized deployments
This document focuses on the deployment processes that produce the artifacts to deploy, and deploy them in the target runtime environment. The refactoring strategy highly depends on the complexity of these processes. The following list outlines a possible, general, refactoring strategy:
- Provision artifact repositories on Google Cloud. For example, you can use Artifact Registry to store artifacts and build dependencies.
- Refactor your build processes to store artifacts both in your source environment and in Artifact Registry.
- Refactor your deployment processes to deploy your workloads in your target Google Cloud environment. For example, you can start by deploying a small subset of your workloads in Google Cloud, using artifacts stored in Artifact Registry. Then, you gradually increase the number of workloads deployed in Google Cloud, until all the workloads to migrate run on Google Cloud.
- Refactor your build processes to store artifacts in Artifact Registry only.
- If necessary, migrate earlier versions of the artifacts to deploy from the repositories in your source environment to Artifact Registry. For example, you can copy container images to Artifact Registry.
- Decommission the repositories in your source environment when you no longer require them.
To facilitate eventual rollbacks due to unanticipated issues during the migration, you can store container images both in your current artifact repositories in Google Cloud while the migration to Google Cloud is in progress. Finally, as part of the decommissioning of your source environment, you can refactor your container image building processes to store artifacts in Google Cloud only.
Although it might not be crucial for the success of a migration, you might need to migrate your earlier versions of your artifacts from your source environment to your artifact repositories on Google Cloud. For example, to support rolling back your workloads to arbitrary points in time, you might need to migrate earlier versions of your artifacts to Artifact Registry. For more information, see Migrate images from a third-party registry.
If you're using Artifact Registry to store your artifacts, we recommend that you configure controls to help you secure your artifact repositories, such as access control, data exfiltration prevention, vulnerability scanning, and Binary Authorization. For more information, see Control access and protect artifacts.
Deploy your workloads
When your deployment processes are ready, you deploy your workloads to GKE. For more information, see Overview of deploying workloads.
To prepare the workloads to deploy for GKE, we recommend that you analyze your Kubernetes descriptors because some Google Cloud resources that GKE automatically provisions for you are configurable by using Kubernetes labels and annotations, instead of having to manually provision these resources. For example, you can provision an internal load balancer instead of an external one by adding an annotation to a LoadBalancer Service.
Validate your workloads
After you deploy workloads in your GKE environment, but before you expose these workloads to your users, we recommend that you perform extensive validation and testing. This testing can help you verify that your workloads are behaving as expected. For example, you may:
- Perform integration testing, load testing, compliance testing, reliability testing, and other verification procedures that help you ensure that your workloads are operating within their expected parameters, and according to their specifications.
- Examine logs, metrics, and error reports in Google Cloud Observability to identify any potential issues, and to spot trends to anticipate problems before they occur.
For more information about workload validation, see Testing for reliability.
Expose your workloads
Once you complete the validation testing of the workloads running in your GKE environment, expose your workloads to make them reachable.
To expose workloads running in your GKE environment, you can use Kubernetes Services, and a service mesh.
For more information about exposing workloads running in GKE, see:
Shift traffic to your Google Cloud environment
After you have verified that the workloads are running in your GKE environment, and after you have exposed them to clients, you shift traffic from your source environment to your GKE environment. To help you avoid big-scale migrations and all the related risks, we recommend that you gradually shift traffic from your source environment to your GKE.
Depending on how you designed your GKE environment, you have several options to implement a load balancing mechanism that gradually shifts traffic from your source environment to your target environment. For example, you may implement a DNS resolution policy that resolves DNS records according to some policy to resolve a certain percentage of requests to IP addresses belonging to your GKE environment. Or you can implement a load balancing mechanism using virtual IP addresses and network load balancers.
After you start gradually shifting traffic to your GKE environment, we recommend that you monitor how your workloads behave as their loads increase.
Finally, you perform a cutover, which happens when you shift all the traffic from your source environment to your GKE environment.
For more information about load balancing, see Load balancing at the frontend.
Decommission the source environment
After the workloads in your GKE environment are serving requests correctly, you decommission your source environment.
Before you start decommissioning resources in your source environment, we recommend that you do the following:
- Back up any data to help you restore resources in your source environment.
- Notify your users before decommissioning the environment.
To decommission your source environment, do the following:
- Decommission the workloads running in the clusters in your source environment.
- Delete the clusters in your source environment.
- Delete the resources associated with these clusters, such as security groups, load balancers, and virtual networks.
To avoid leaving orphaned resources, the order in which you decommission the resources in your source environment is important. For example, certain providers require that you decommission Kubernetes Services that lead to the creation of load balancers before being able to decommission the virtual networks containing those load balancers.
Optimize your Google Cloud environment
Optimization is the last phase of your migration. In this phase, you iterate on optimization tasks until your target environment meets your optimization requirements. The steps of each iteration are as follows:
- Assess your current environment, teams, and optimization loop.
- Establish your optimization requirements and goals.
- Optimize your environment and your teams.
- Tune the optimization loop.
You repeat this sequence until you've achieved your optimization goals.
For more information about optimizing your Google Cloud environment, see Migrate to Google Cloud: Optimize your environment and Google Cloud Architecture Framework: Performance optimization.
The following sections integrate the considerations in Migrate to Google Cloud: Optimize your environment.
Establish your optimization requirements
Optimization requirements help you narrow the scope of the current optimization iteration. For more information about optimization requirements and goals, see Establish your optimization requirements and goals.
To establish your optimization requirements for your GKE environment, start by consider the following aspects:
- Security, privacy, and compliance: help you enhance the security posture of your GKE environment.
- Reliability: help you improve the availability, scalability, and resilience of your GKE environment.
- Cost optimization: help you optimize the resource consumption and resulting spending of your GKE environment.
- Operational efficiency: help you maintain and operate your GKE environment efficiently.
- Performance optimization: help you optimize the performance of the workloads deployed in your GKE environment.
Security, privacy, and compliance
- Monitor the security posture of you GKE clusters. You can use the security posture dashboard to get opinionated, actionable recommendations to help you improve the security posture of your GKE environment.
- Harden your GKE environment. Understand the GKE security model, and how to harden harden your GKE clusters.
- Protect your software supply-chain. For security-critical workloads, Google Cloud provides a modular set of products that implement software supply chain security best practices across the software lifecycle.
Reliability
Improve the reliability of your clusters. To help you design a GKE cluster that is more resilient to unlikely zonal outages, prefer regional clusters over zonal or multi-zonal ones.
Workload backup and restore. Configure a workload backup and restore workflow with Backup for GKE.
Cost optimization
For more information about optimizing the cost of your GKE environment, see:
- Right-size your GKE workloads at scale.
- Reducing costs by scaling down GKE clusters during off-peak hours.
- Identify idle GKE clusters.
Operational efficiency
To help you avoid issues that affect your production environment, we recommend that you:
- Design your GKE clusters to be fungible. By considering your clusters as fungible and by automating their provisioning and configuration, you can streamline and generalize the operational processes to maintain them and also simplify future migrations and GKE cluster upgrades. For example, if you need to upgrade a fungible GKE cluster to a new GKE version, you can automatically provision and configure a new, upgraded cluster, automatically deploy workloads in the new cluster, and decommission the old, outdated GKE cluster.
- Monitor metrics of interest. Ensure that all the metrics of interest about your workloads and clusters are properly collected. Also, verify that all the relevant alerts that use these metrics as inputs are in place, and working.
For more information about configuring monitoring, logging, and profiling in your GKE environment, see:
Performance optimization
- Set up cluster autoscaling and node auto-provisioning. Automatically resize your GKE cluster according to demand by using cluster autoscaling and node auto-provisioning.
- Automatically scale workloads. GKE supports several scaling
mechanisms, such as:
- Automatically scale workloads based on metrics.
- Automatically scale workloads by changing the shape of the number of Pods your Kubernetes workloads by configuring Horizontal Pod autoscaling.
- Automatically scale workloads by adjusting resource requests and limits by configuring Vertical Pod autoscaling.
For more information, see About GKE scalability.
What's next
- Learn when to find help for your migrations.
- For more reference architectures, diagrams, and best practices, explore the Cloud Architecture Center.
Contributors
Author: Marco Ferrari | Cloud Solutions Architect