Best Practices for Migrating Virtual Machines to Compute Engine

This article provides guidance and best practices for migrating computing workloads to Google Cloud Platform.

When you migrate a workload to the cloud, it is important to consider how you will handle the different components of your infrastructure. The methodology for moving data is different from the methodology for moving databases, which is different from the methodology for moving computing resources.

When evaluating a migration to the cloud, many customers look at costs as a main driver. With sustained use discounts (SUD) on Google Compute Engine virtual machines (VMs), costs can be significantly lower than managing hardware or virtual machines in a traditional data center. When migrating from a different cloud to Cloud Platform, you can take advantage of those same pricing advantages.

However, the advantage most customers realize is agility, because you can create virtual machines in almost an instant, without having to wait for resources to be acquired and provisioned. VMs in the cloud allow businesses to quickly spin up new applications, experiment with them, and turn them off as necessary. This ability to quickly experiment, fail fast if necessary, and find out what works and what doesn’t in such a short time is a huge advantage. These abilities reduce the cost of innovation. Different groups in your company don’t need to worry about buying and then using infrastructure for a small experiment. Even coming from different cloud providers, customers can see advantages in a fast, global network, and fast startup times for VMs.

Finally, many customers take advantage of being able to consolidate overhead. Usually data centers require many different vendors, each with their own relationship, billing model, and contracts. Moving to the cloud can allow you to significantly reduce that overhead. Your staff no longer have to deal with the management overhead of running a data center. Instead, all your employees can focus on what makes your business thrive.

Because the majority of all technical workloads require computing resources, or compute, this article focuses on the tasks you need to perform to migrate VMs, and the recommended practices. However, since compute is so essential to most workloads, there is necessarily some discussion of other things that make applications work, such as databases, messaging, and analytics.

The migration process is not just one single, giant step. The following sections describe the recommended steps.

Calculating the costs

The first step, before moving any VMs, is to calculate the cost of the move. This means evaluating the cost of what you are currently running in your data centers. This is not just the cost of the physical machines. It also includes the cost of the networking gear, power, cooling, staffing, and leasing of the data centers.

When you've calculated those costs, you then need to have a cost model for the cloud. You should consider:

  • What kind of operating system will you use? Does that require a license or not?
  • What kind of virtual machines types will you use? They come in a variety of sizes and the costs depend on the size. You will need to have some idea of what kind of performance characteristics your applications have.
  • Beyond virtual machines, what other services does your application require, and how are they priced?
  • Do these applications require any licensed software? How much will that cost and is it available in the cloud?

These are all cost considerations you need to have planned before a single VM is moved. Your Cloud Platform sales team can help you evaluate and calculate these costs.

If you are already using a cloud provider, these calculations might be different. For example, you won't need to consider the cost of leasing your own data center. But the requirement remains. Before migrating, it is important to understand the different billing models between your current provider and Google. For example, if you are migrating from Amazon Web Services (AWS), you can take a look at an assessment of pricing for virtual machines in the Cloud Platform blog.

Assessing the items to migrate

After you have evaluated the cost of the move, then you can start looking at what to migrate. In modern enterprises, there are many different kinds of applications, from customer-facing apps, to back office apps, to developer tools, to experimental applications. Moving all these applications at the same time and the same way doesn’t make sense.

We recommend sorting applications into three broad buckets:

  • Applications that are easy to move. These have fewer dependencies, are newer, are written internally so have no licensing considerations, and are more tolerant to scaling and other cloud patterns.
  • Applications that are difficult to move. These have more dependencies, are less tolerant to scaling, or have complex license requirements.
  • Applications that can’t be moved. Some applications that might not be good candidates to migrate run on specialized or older hardware, have business or regulatory requirements that make it necessary for them to stay in your data center, or have complex license requirements that don’t allow them to move to the cloud.

These are just examples of each of these three buckets, and it is likely your applications have many more deciding factors. Your Cloud Platform sales team can help you.

These considerations all apply whether you’re moving from a data center or another cloud provider.

When this work is done, you can pick your first application or applications to migrate. We strongly recommend you migrate a only a few applications at first. The first ones will provide not only the template for future migrations, but also help you define your migration processes.

Designing the migration

When you have decided which applications to move, you need to design what your cloud environment will be before you move anything. The first step is to find out how your current environment compares to Cloud Platform. The following table provides a brief overview of the comparison:

Service Type Data Center Google Cloud Platform
Compute Physical hardware, virtualized hardware (VMWare ESXi, Hyper-V, KVM, XEN) Google Compute Engine
Storage SAN, NAS, DAS Persistent disk, Google Cloud Storage
Network MPLS, VPN, hardware load balancing, DNS Google Cloud VPN, Google Cloud Interconnect, Compute Engine load balancing, Google Domains, Google Cloud DNS
Security Firewalls, NACLs, route tables, encryption, IDS, SSL Compute Engine firewalls, encryption, IDS, SSL
Identity Active Directory, LDAP IAM, GADS, LDAP
Management Configuration services, CI/CD tools Deployment Manager, configuration services, continuous integration/continuous delivery (CI/CD) tools

If you are migrating from AWS, you can use the comparison guide to help evaluate how AWS and Cloud Platform services compare.

After you have compared the various services to gain an understanding of what your applications need and use, you can start planning what your environment should look like on Cloud Platform.

Before migrating any VMs, you need to build out the environment for the application. The following sections provide recommended steps.

Establishing governance

You need to establish who in your company can have permission to create, access, modify, and destroy cloud resources. You must also determine how resources will be paid for. You can find guidance in the IAM best practices documentation.

At this point, you should also determine who has accounts in the cloud. It might not be necessary for everyone at your company, or even all of your developers, to have direct access. You should establish the accounts and the policy for creating and deleting those accounts, up front.

Creating a network

Before you move any VMs, the network they migrate to must exist. Similar to permissions and accounts, it’s important to create this network in advance, because establishing procedures after applications are in flight can be difficult.

When designing a network, it’s important to realize you’re not only creating a network for an application, but a pattern for your company to follow. Consider the following questions:

  • Will you have one network for each application or for each environment?
  • How will your applications access shared services?
  • Will you employ a hub-and-spoke model of networks, or a full network mesh?
  • How will you connect the various networks?
  • Will you use a VPN connection between them or use a cross-project network?

Given the array of options and the differences between companies, it is difficult to provide prescriptive advice that works in every situation. We recommend only that you evaluate your needs and choose a strategy before you start deploying applications.

The second half of network design is deciding how to connect to your existing resources. Google provides a number of different connection options, depending on your needs.

At the most basic, you can create a VPN connection between your existing resources and Google. With this service, you can create either static or dynamic routes between both locations.

If you need a faster connection to Google, you can engage with Cloud Interconnect partners, who can help you create a direct-leased line to Google.

Finally, you might choose to create a direct link to Google at one of our many direct peering locations.

Planning for operations

When you do have applications running in the cloud, you need to monitor them, retain logs, and operationally manage them, just as you would in any system. You must think about these operations as part of your advanced planning.

There are a number of third-party configuration tools available. Software such as Chef, Puppet, and others can help you. If you already use these tools, you should continue to use them in your Cloud Platform environment. If you don’t, we strongly recommend evaluating one to see which will work best for you, given the way your developers and operational engineers work. These tools can work with and complement Google Cloud Deployment Manager, which we recommend you incorporate into your deployment and operational management.

You have a similar decision to make for monitoring and logging. There are several third-party tools that work well on Cloud Platform. If you already use one or more of them, you should continue to do so. Otherwise, you should consider a variety of tools, including Google Stackdriver, which integrates monitoring, logging, and alerting in a single service.

Start migrating

Finally, you should migrate your first application. The first migration will serve as your template for future migrations. You will surely refine your process as you do further migrating, but it’s important to record everything you do in the first migration, in particular.

The next section discusses high-level migration architectures and the steps required to migrate in each scenario.

Migration architectures

Broadly speaking, there are three types of migration architectures that you can follow for each application.

The first is completely redesigning an application for the cloud. While this is a viable option, it is out of scope for this article, as it is essentially the same path as creating a new application in the cloud.

The following sections discuss the second and third of these approaches.

Lift-and-shift migration

This approach is generally the easiest way to start, as it requires the fewest changes to an application. This scenario entails shutting down the existing application and copying the data and virtual machines to the cloud.

Moving the data

The first step in a lift and shift is to move the data necessary to run the application. Often this is the database data, but it can also include static assets that you could move from a SAN to object storage in Cloud Storage. You would also move any data needed locally by the application to Cloud Storage, to be downloaded to Compute Engine persistent disks when your virtual machines launch.

Moving the VMs

The next step is to move your VMs. There are a number of ways to do this. Google provides documentation about how to import images manually. You can also use a third-party partner service, such as CloudEndure, as detailed later in the article.

Testing the application

After the data and the VMs are running on Cloud Platform, you should run the application in test mode to ensure it runs the way you expect. Testing includes meeting your performance metrics and checking that the app is properly deployed, monitored, and logged.

Moving to production

How long this step takes depends on the nature of the application. In order to switch to where the production application runs, there almost always needs to be some period of time when the application is offline. How to move to production without taking an application offline, from the user's perspective, is out of scope for this set of best practices.

Two things determine how long the move can take:

  • How long has it been since you did the original data import?
  • How long will it take DNS to update entries for your application's front end?

You generally have much more control over the first concern. If you need your time window for moving to production to be small, then you want to make that move as soon as possible after you import the data.

After you have determined an acceptable window:

  1. Inform your users about the maintenance window.
  2. Take down your application from its current location.
  3. Import the missing data to the appropriate location(s) in Cloud Platform.
  4. Switch the DNS entries and turn on the application in the cloud.

If all goes well, the application should be fully migrated and you can let your users start using it again.

Hybrid migration

The third architectural approach is a hybrid migration. Here, only part of the application moves to the cloud, typically the front end and possibly the application logic. The back end and associated services remain with the rest of the existing resources. Hybrid migration is a variation on lift and shift; it's useful for when the application can be moved to the cloud but some backend services can’t. This type of migration requires fewer steps than lift and shift. For example, data storage doesn't move to the cloud in this architecture.

Determining the network connection

The first step is to create a connection fast enough between your existing resources and Cloud Platform. What this connection looks like depends entirely on your application's profile. In some cases, a VPN might be enough. In others, you will require a dedicated, high-speed line.

Migrating the VMs

The next step is migrating the VM images, similar to the lift-and- shift scenario. Again, at this point you should test the application to make sure it performs as expected in the cloud, linking back to your existing resources.

Moving to production

Finally, your maintenance window is much smaller in this case because you don’t need to migrate data. You still need to:

  1. Inform your users of a maintenance window.
  2. Turn off the existing application.
  3. Switch the DNS entries.
  4. Turn on the application in the cloud.

If all goes well, your app should be running on Cloud Platform. At this point, you can once again let your users access it.

Using CloudEndure

CloudEndure is a tool chain and online service that facilitates the migration of machines from one platform to another, with minimal downtime and continuous replication. The solution is comprised of the following technology pillars:

  • Continuous block-level replication agent. The agent methodology allows migration of any physical, virtual, or cloud-based machines, regardless of the source machine infrastructure. The block-level methodology moves everything, including OS data, patches, configurations, applications, databases, and so on, and eliminates the risks involved in human error of missing key components. This approach has an extremely high migration-success rate for any supported OS, regardless of the application. Continuous replication ensures that the data is always in real-time sync, which reduces cutover windows by eliminating the need to catch up on periodic data changes.
  • Automated system conversion engine. Allows any source system to be automatically and quickly converted to Cloud Platform format at time of launch. This eliminates significant downtime involved in manually preparing the systems for the migration. CloudEndure's conversion engine helps ensure that the machine is ready for Compute Engine in minutes, and brings up the machine using the most up-to-date, replicated state.
  • Automated application stack orchestration. Allows the migration project engineers to define, in advance, which zones and regions each machine should be migrated to as well as the amount of CPU and RAM required to support its workload. CloudEndure also pre-configures networks and disks in preparation for the migration.

These pillars lower the primary risks and downtime involved in typical migration projects, enabling you to execute your migration project quickly, confidently, and reliably.

With CloudEndure, the process of migrating your virtual machines becomes:

  1. Run the CloudEndure agent on source machines.
  2. Monitor the CloudEndure dashboard for replication to complete.
  3. Quiesce the workload on the source machines.
  4. Run one last incremental sync for any data that might have changed.
  5. Migrate your service endpoints to the newly provisioned destination machines on Compute Engine.

For details about how to perform a VM migration, see the Migrating VMs to Compute Engine tutorial.

Example use cases

This section describes 5 ways that customers have used CloudEndure to rapidly migrate their workload to take advantage of Compute Engine.

Migrating many workloads in parallel

A retail company was acquired and needed to move all of their workloads to Cloud Platform. Their Linux-based fleet of machines was spread across on-premises data centers and various cloud providers. While some machines were easy to move over, they knew they would be challenged in moving over their database and SAN devices with minimal downtime. Each minute of downtime for the retailer would mean loss of revenue.

CloudEndure was able to continuously replicate their entire fleet, in parallel, while also upgrading their Linux kernels on the destination Compute Engine VMs. They migrated the SAN storage volumes directly into Compute Engine persistent disks. When all replication had completed, the cutover window was narrowed to just a few minutes.

Migrating Windows workloads running on VMWare

A company with multiple data centers in Asia had a heavy footprint of Windows virtual machines running on VMware. They needed to move their Windows Server 2008R2 and Windows Server 2012R2 servers to Cloud Platform, with minimal downtime to their critical services, such as Active Directory and Exchange.

Despite the many disparate apps and the mix of Windows versions, they were able to use CloudEndure’s disk-level replication to use the same migration approach for all of their systems.

Migrating a large multi-tiered application

A company with a large, multi-tiered application workload wanted to migrate their servers from their data center to the cloud. Ninety percent of their on-premises workload was virtual, and the customer was aware of tools that allowed them to slowly ship virtual disk snapshots into the cloud and perform manual system modifications to make the machines cloud-ready. Although not ideal, because significant downtime would be required during cutover, it was possible. However, 10% of the workload was running on physical servers, which happened to host critical databases. This posed a significant migration problem because the customer couldn't use any hypervisor-based methodologies to migrate these crucial physical servers.

Using CloudEndure's infrastructure-agnostic agent, coupled with its automated machine conversion engine, the customer was able to move both the physical and virtual servers into the cloud, using the same process and tools. Furthermore, CloudEndure's disk-level replication approach kept the migration process identical across all application tiers. Block-level replication reduced the cutover downtime to minutes, instead of many hours.

Migrating a frequently updated app

A company with a frequently updated application workload wanted to migrate to the cloud. Initially, the customer set up pre-configured, standby systems in the cloud, and attempted to move the source application data into the target standby systems. They planned to promote the standby applications to primary ones at a later date. However, moving the data and keeping it in sync proved to be a time consuming endeavor, during which the source application and OS continued to be frequently updated. The customer grew more concerned over time that some patches or application changes would be left behind during the cutover stage, causing an outage.

CloudEndure's block-level replication approach addressed the concern. Cloud Endure ensured that all the disk blocks would be replicated in a consistent state. Target disks would not only maintain the actual application data using its most up-to-date state, but OS patches, updates, application configuration, and more, would be kept intact, as well.

Migrating from multiple data centers

A company with multiple data centers, both on-premises and in colocation facilities, needed to consolidate all of them as part of a migration to the cloud. Aside from the typical challenges of moving applications from one infrastructure to another, the customer also ran into networking challenges. Some of the applications were using identical, private IP space in multiple segregated networks, which would have resulted in conflicts once migrated into a single, consolidated, cloud-based network. It was clear that networking changes would be required in the cloud before the migration could be executed. It was critical to be able to make such changes and test them easily, and in a non- disruptive fashion.

CloudEndure's target-machine-blueprint mechanism allowed the customer to define and redefine, as frequently as needed, how the target-server network settings were going to be provisioned in the cloud. After each blueprint configuration iteration, the customer could test spinning up the target servers in an isolated cloud network and verify their behavior without impacting the source servers. When all test criteria had been met, the cutover window was scheduled and the migration cutover was executed, with high predictability and low risk.

What's next

Send feedback about...