This article is the first part of a multi-part series that discusses hybrid and multi-cloud deployments, architecture patterns, and network topologies. This part explores the opportunities and challenges of hybrid and multi-cloud deployments, and provides guidance on how to approach and implement a hybrid setup that uses Google Cloud.
The series consists of these parts:
- Hybrid and multi-cloud patterns and practices (this article)
- Hybrid and multi-cloud architecture patterns
- Hybrid and multi-cloud network topologies
Digitalization and the need to adapt rapidly to changing market demands have caused a rise in the requirements and expectations that are placed on enterprise IT. Many companies find it challenging to accommodate and adapt to these trends by using existing infrastructure and processes.
At the same time, IT departments often find themselves under scrutiny and pressure to improve cost effectiveness, making it difficult to justify additional capital expenditure (capex) investments to extend and modernize data centers and equipment.
A hybrid cloud strategy provides a pragmatic solution. By using the public cloud, you can extend the capacity and capabilities of your IT without up-front capex investments. By adding one or more cloud deployments to your existing infrastructure, you not only preserve your existing investments, but also avoid committing yourself to a single IT vendor. Additionally, by using a hybrid strategy, you can modernize applications and processes incrementally as resources permit.
Hybrid cloud and multi-cloud
Because workloads, infrastructure, and processes are unique to each enterprise, each hybrid strategy must be adapted to specific needs. The result is that the terms hybrid cloud and multi-cloud are sometimes used inconsistently.
Within the context of Google Cloud, the term hybrid cloud describes a setup in which common or interconnected workloads are deployed across multiple computing environments, one based in the public cloud, and at least one being private.
The most common example is combining a private computing environment, usually an existing, on-premises data center, and public cloud computing environment, as the following diagram shows.
The term multi-cloud describes setups that combine at least two public cloud providers, as in the following diagram.
A multi-cloud setup might also include private computing environments.
Drivers for hybrid cloud and multi-cloud setups
Hybrid and multi-cloud setups might be temporary, maintained only for a limited time to facilitate a migration. However, these setups might also represent the future state of most organizations as they build new systems and evolve existing ones to get the best from each, no matter where the setup runs. Hybrid and multi-cloud setups might therefore be permanent fixtures in the IT landscape.
A hybrid or multi-cloud setup is rarely a goal in itself, but rather a means of meeting business requirements. Choosing the right hybrid or multi-cloud setup therefore requires first clarifying these requirements.
Business drivers and constraints
Common drivers and constraints from the business side include the following:
- Reducing capex or general IT spending.
- Increasing flexibility and agility to respond better to changing market demands.
- Building out capabilities, such as advanced analytics services, that might be difficult to implement in existing environments.
- Improving quality and availability of service.
- Improving transparency regarding costs and resource consumption.
- Heeding laws and regulations about data sovereignty.
- Avoiding or reducing vendor lock-in.
Design and development drivers
Common drivers from the design and development side include the following:
- Automating and accelerating application rollouts to achieve faster time to market and shorter cycle times.
- Leveraging high-level APIs and services to speed up development.
- Accelerating the provisioning of compute and storage resources.
Operations requirements and constraints
Requirements and constraints to consider from the operations side include the following:
- Ensuring consistent authentication, authorization, auditing, and policies across computing environments.
- Using consistent tooling and processes to limit complexity.
- Providing visibility across environments.
On the architecture side, the biggest constraints often stem from existing systems and can include the following:
- Dependencies between applications.
- Performance and latency requirements for communication between systems.
- Reliance on hardware or operating systems that might not be available in the public cloud.
- Licensing restrictions.
The goal of a hybrid and multi-cloud strategy is to meet these requirements with a plan that describes the following:
- Which workloads should be run in or migrated to each computing environment.
- Which patterns to apply across multiple workloads.
- Which technology and network topology to use.
Fundamentally, any hybrid and multi-cloud strategy is derived from the business requirements. How you derive a usable strategy from the business requirements is rarely clear, however. The workloads, architecture patterns, and technologies you choose not only depend on the business requirements, but also influence each other in a cyclic fashion. The following diagram illustrates this cycle.
Defining a vision
Within this web of dependencies and constraints, defining a plan that considers all workloads and requirements is difficult at best, especially in a complex IT environment. In addition, planning takes time and might lead to competing stakeholder interests.
To avoid this situation, first develop a vision statement that focuses on the business perspective and addresses the following questions:
- Why is the current approach and computing environment insufficient?
- What are the primary metrics that you want to optimize for by using the public cloud?
- How long do you plan to use a hybrid or multi-cloud setup? Do you consider this setup permanent, or interim for the length of a full cloud migration?
The vision statement does not address how to achieve these goals.
Agreeing on a vision and obtaining relevant stakeholder sign-off provide a foundation for the next steps in the planning process.
Designing a hybrid and multi-cloud strategy
After you have settled on a vision, you can elaborate the strategy:
Conduct an initial workload assessment. Considering the goals outlined in the vision document, identify a candidate list of planned and existing workloads that could benefit from being deployed or migrated to the public cloud. The following section discusses this topic in more detail.
Starting with the identified candidate workloads, identify applicable patterns and, based on those patterns, candidate topologies.
If you identify more than one applicable pattern and topology, refine your workload selection so that you can settle on a single pattern and topology. Iterate as necessary to refine your selections.
Applying multiple patterns and topologies is a viable approach for large organizations. But this approach is rarely ideal because of the extra complexity, which in turn might slow your progress.
Prioritize your workloads. Given the many requirements, it's best to take an iterative approach.
Select an initial workload to put in the public cloud. Make sure that this workload is not business critical or too difficult to migrate, yet typical enough to serve as a blueprint for upcoming deployments or migrations.
While selecting a workload to migrate, start preparing on the Google Cloud side.
Set up the Google Cloud organization, projects, and policies that you need in order to prepare your cloud environment for its first deployments.
Implement the network topology and establish the necessary connectivity between Google Cloud and your private computing environments.
The decision about which workloads to run on which computing environments has a profound impact on the effectiveness of a hybrid and multi-cloud strategy. Putting the wrong workload on the cloud can complicate your deployment while providing little benefit. Putting an appropriate workload in the right place not only helps the workload, but helps you learn about the benefits of each environment.
A common way to begin using the public cloud is cloud first. In this approach, you deploy new workloads to the public cloud. In that case, consider a classic deployment to a private computing environment only if a cloud deployment is not possible for technical or organizational reasons.
The cloud-first strategy has advantages and disadvantages. On the positive side, it's forward looking. You can deploy new workloads in a clean and cloud-native fashion while avoiding (or at least minimizing) the hassles of migrating existing workloads.
On the downside, using a cloud-first strategy might cause you to miss opportunities for your existing workloads. New workloads might constitute only a fraction of your overall IT workload, and their impact on overall IT spending and performance might be limited. The time you spend migrating an existing workload might yield bigger advantages or savings than trying to accommodate a new workload in the cloud.
Following a strict cloud-first strategy also risks increasing the overall complexity of your IT environment. This approach might create redundancies, lower performance due to excessive cross-environment communication, or result in a computing environment that is not well suited for the individual workload.
Considering these risks, you might be better off using a cloud-first approach only for selected workloads. That way you can concentrate on workloads that can benefit the most from a cloud deployment or migration. This approach also takes into account the modernization of existing workloads, which is discussed in the next section.
Migration and modernization
Hybrid/multi-cloud and IT modernization are distinct concepts that are linked in a virtuous circle. Using the public cloud can facilitate and simplify the modernization of IT workloads, and modernizing your IT will help you get more from the cloud.
The primary goals of modernizing workloads are as follows:
- Achieving greater agility so that you can adapt to changing requirements.
- Reducing costs of infrastructure and operations.
- Increasing reliability and resiliency in order to minimize risk for the business.
As described in Migration to Google Cloud, you can implement one of the following migration types, or even combine multiple types as needed:
- Lift and shift
- Improve and move
- Rip and replace
Lift and shift
Lift and shift describes the process of migrating a workload from a private computing environment to the public cloud without changing the workload in any significant manner. Most commonly, this process involves migrating existing virtual machines (VMs) and their images to Compute Engine.
Running VMs in Compute Engine rather than in a private computing environment has these benefits:
You can provision computing and storage resources quickly, avoiding delays that are caused by procuring and installing equipment in classic (private or on-premises) data centers.
You pay only for the compute resources that you use, with no up-front commitment or investment.
You can automate operational tasks and reduce effort and costs as a result.
If you also then rewrite applications to become more cloud native, you can unlock significant additional benefits:
By using autoscaling, you can ensure that computing resources are provisioned only when they are needed, avoiding any over-provisioning costs.
You can take advantage of cluster managers such as Kubernetes to increase the resiliency of your applications by automatically restarting them or migrating them to different machines in case of failure.
You can further reduce the operational overhead by using managed services.
You can automate deployment, which helps accelerate product development and release processes, which in turn can help your business react more quickly to feedback, changing requirements, and market demands.
As this diagram shows, when you are modernizing an existing workload, consider shifting the application to the cloud and improving the application to become cloud native.
Improve and move
Although it is common to shift an application to the cloud before investing in improvement, the reverse approach might be better for some applications. The idea of improve and move is to begin a migration by refactoring and modernizing an application that is already in place. Even before you move the application to the cloud, this improvement has a number of benefits:
You can improve the deployment process.
Investing in continuous integration/continuous deployment (CI/CD) infrastructure and tooling can speed up the release cadence and shorten feedback cycles.
After the improvement, you move the application to the cloud, which helps you to provision resources quickly and increase cost efficiency by using autoscaling and therefore not over-provisioning.
For improve and move to work well, consider making certain investments in on-premises infrastructure and tooling, such as setting up a local Docker registry and provisioning Kubernetes clusters to containerize applications.
Rip and replace
Rip and replace refers to removing a system and replacing it. In some cases, trying to evolve an existing system and code base might not be cost effective or even possible. Requirements might have changed substantially, or the existing application might be based on a software or hardware stack that is not fit for future investments. In such cases, a better approach might be to replace the system, which might mean either purchasing a new solution or developing a modern and cloud-native application from scratch.
Mixing and matching migration approaches
Each of the three migration approaches has certain strengths and weaknesses. A key advantage of following a hybrid and multi-cloud strategy is that it is not necessary to settle on a single approach. Instead, you can decide which approach works best for each workload.
Choose lift and shift if any of the following is true of the workloads:
- They have a relatively small number of dependencies on their environment.
- They are not considered worth refactoring.
- They are based on third-party software.
Consider improve and move for these types of workloads:
- They have dependencies that must be untangled.
- They rely on operating systems, hardware, or database systems that cannot be accommodated in the cloud.
- They are not making efficient use of compute or storage resources.
- They cannot easily be deployed in an automated fashion.
Finally, rip and replace might be best for these types of workloads:
- They no longer satisfy current requirements.
- They are based on third-party technology that has reached its end of life.
- They require third-party license fees that are no longer economical.
In most migrations, shifting a workload to the cloud is a one-time, irreversible effort. But in the case of a hybrid and especially for multi-cloud scenarios, you might want to be able to shift workloads between clouds later. To facilitate this ability, make sure that your workloads are portable:
- Make sure you can shift a workload from one computing environment to another without significant modification.
- Make sure that application deployment and management are consistent across computing environments.
- Make sure that keeping the workload portable does not conflict with the workload being cloud native.
At the infrastructure level, you can use tools such as Terraform to automate and unify creation of infrastructure resources such as VMs and load balancers in heterogeneous environments. Additionally, you can use configuration management tools such as Ansible, Puppet, or Chef to establish a common deployment and configuration process. Alternatively, you can use an image-baking tool like Packer to create VM images for different platforms by using a single, shared configuration file. Finally, you can use solutions such as Prometheus and Grafana to help ensure consistent monitoring across environments.
Based on these tools, you can assemble a tool chain similar to the one in the following diagram. This tool chain abstracts away the differences between computing environments, and it lets you unify provisioning, deployment, management, and monitoring.
Although a common tool chain can help you achieve portability, it is subject to several shortcomings:
You might not be able to make use of certain features that a cloud environment offers natively. Specifically, using VMs as a common foundation makes it difficult to implement truly cloud-native applications. Sometimes, using VMs prevents you from using managed services, so you might miss opportunities to reduce administrative overhead.
Building up and maintaining the tool chain incurs overhead and operational costs.
Over time, the tool chain might grow to become complex in ways that are unique to your company. This complexity can lead to increased training costs.
Containers and Kubernetes
Building and maintaining a custom tool chain to achieve workload portability by using VMs involves many challenges. One solution is to leverage containers and Kubernetes instead.
Containers help your software to run reliably when you move it from one environment to another. Kubernetes handles the orchestration, deployment, scaling, and management of your containerized applications, providing the services that form the foundation of a cloud-native application. Because you can install and run Kubernetes on a variety of computing environments, you can also use it to establish a common runtime layer across computing environments:
Kubernetes provides the same services and APIs in a cloud or private computing environment. Moreover, the level of abstraction is much higher than when working with VMs, which generally translates into less required groundwork and improved developer productivity.
Unlike a custom tool chain, Kubernetes is widely adopted for both development and application management, so you can tap into existing expertise, documentation, and third-party support.
Kubernetes uses Docker containers, an industry-adopted standard for application packaging that is not tied to any specific vendor. Kubernetes itself is open source and governed by the Cloud Native Computing Foundation.
You can avoid the effort of installing and operating Kubernetes by using a managed Kubernetes platform such as Google Kubernetes Engine (GKE), so operations staff can shift their focus from infrastructure to applications. The following diagram shows what a managed Kubernetes platform might look like.
Limits to workload portability
To help make your workloads more portable, Kubernetes provides a layer of abstraction that can hide many of the intricacies of and differences between computing environments.That abstraction has some limitations, however:
An application might be portable to a different environment with minimal changes, but that doesn't mean that the application will perform equally well in both environments. Differences in underlying compute or networking infrastructure along with proximity to dependent services might lead to substantially different performance.
Moving a workload between computing environments might also require you to move data. In addition to the time, effort, and budget that is needed to copy or move data between computing environments, those environments often differ in the services and facilities that they provide to store and manage such data.
Kubernetes offers a unified way to provision different kinds of load balancers. The behavior of these load balancers is not defined in detail, however, and might differ between environments in subtle ways.
GKE integrates role-based access control (RBAC) with Identity and Access Management, but in other environments, the ways to configure RBAC and secure workloads might differ.
Even with Kubernetes, it can be challenging to abstract away differences between computing environments or public clouds. Workload portability aims primarily to simplify migrations between environments, not to automate them.
When you have new projects that are in progress and hundreds or even thousands of workloads that are already running, it can be daunting to decide which workloads to deploy or migrate to which computing environment.
To help you make such decisions consistently and objectively, consider categorizing and scoring workloads by opportunity, risk, and technical difficulty.
These factors can help you evaluate migration opportunities:
- Potential for market differentiation or innovation that is enabled by using cloud services
- Potential savings in total cost of ownership for an application
- Potential improvements in availability, resiliency, security, or performance
- Potential speedup of development and release processes
These factors can help you evaluate migration risks:
- Potential impact of outages that are caused by a migration or by the fact that your experience with public cloud deployments might be limited at first
- The need to comply with any existing legal or regulatory restrictions
These factors can help you evaluate the technical difficulties of a migration:
- Size, complexity, and age of the application
- Number of dependencies with other applications
- Any restrictions that third-party licenses impose
- Dependencies on specific versions of operating systems, databases, or other environment configurations
After you have assessed the initial workloads, you can begin to prioritize workloads and identify applicable architecture patterns and network topologies. This step might require multiple iterations. Because your assessment might change over time, it is also worth reevaluating workloads after you do your first cloud deployments.
- Learn more about how to get started with your migration to Google Cloud.
- Learn about common architecture patterns for hybrid and multi-cloud, which scenarios they are best suited for, and how to apply them.
- Find out more about network topologies for hybrid and multi-cloud, and how to implement them.
- Read about our best practices for migrating VMs to Compute Engine.
- Explore how our partners can help migrating your workloads to Google Cloud.
- Explore reference architectures, diagrams, and best practices about Google Cloud. Take a look at our Cloud Architecture Center.