Application deployment and testing strategies

This document provides an overview of commonly used application deployment and testing patterns. It looks at how the patterns work, the benefits they offer, and things to consider when you implement them.

Suppose you want to upgrade a running application to a new version. To ensure a seamless rollout, you would typically consider the following:

  • How to minimize application downtime, if any.
  • How to manage and resolve incidents with minimal impact on users.
  • How to address failed deployments in a reliable, effective way.
  • How to minimize people and process errors to achieve predictable, repeatable deployments.

The deployment pattern you choose largely depends on your business goals. For example, you might need to roll out changes without any downtime, or roll out changes to an environment or a subset of users before you make a feature generally available. Each methodology discussed in this document accounts for particular goals that you need to meet before a deployment is deemed successful.

This document is intended for system administrators and DevOps engineers who work on defining and implementing release and deployment strategies for various applications, systems, and frameworks. Using examples based on Google Kubernetes Engine (GKE), an accompanying tutorial shows how to implement the deployment and testing techniques discussed in this document.

Deployment strategies

When you deploy a service, it's not always exposed immediately to users. Sometimes, it's only after the service is released that users see changes in the application. However, when a service is released in-place, deployment and release occur simultaneously. In this case, when you deploy the new version, it starts accepting production traffic. Alternatively, there are deployment strategies for provisioning multiple service versions in parallel. These deployment patterns let you control and manage which version receives an incoming request. Read Kubernetes and the challenges of continuous software delivery for more information on deployments, releases, and related concepts.

The deployment patterns discussed in this section offer you flexibility in automating the release of new software. What approach is best for you depends upon your goals.

Recreate deployment pattern

With a recreate deployment, you fully scale down the existing application version before you scale up the new application version.

The following diagram shows how a recreate deployment works for an application.

The flow of a recreate deployment.

Version 1 represents the current application version, and Version 2 represents the new application version. When you update the current application version, you first scale down the existing replicas of Version 1 to zero, and then you concurrently deploy replicas with the new version.

Key benefits

The advantage of the recreate approach is its simplicity. You don't have to manage more than one application version in parallel, and therefore you avoid backward compatibility challenges for your data and applications.

Considerations

The recreate method involves downtime during the update process. Downtime is not an issue for applications that can handle maintenance windows or outages. However, if you have mission-critical applications with high service level agreements (SLAs) and availability requirements, you might choose a different deployment strategy.

Rolling update deployment pattern

In a rolling update deployment, you update a subset of running application instances instead of simultaneously updating every application instance, as the following diagram shows.

The flow of a rolling update deployment.

In this deployment approach, the number of instances that you update simultaneously is called the window size. In the preceding diagram, the rolling update has a window size of 1. One application instance is updated at a time. If you have a large cluster, you might increase the window size.

With rolling updates, you have flexibility in how you update your application:

  • You can scale up the application instances with the new version before you scale down the old version (a process known as a surge upgrade).
  • You can specify the maximum number of application instances that remain unavailable while you scale up new instances in parallel.

Key benefits

  • No downtime. Based on the window size, you incrementally update deployment targets, for example, one by one or two by two. You direct traffic to the updated deployment targets only after the new version of the application is ready to accept traffic.
  • Reduced deployment risk. When you roll out an update incrementally, any instability in the new version affects only a portion of the users.

Considerations

  • Slow rollback. If the new rollout is unstable, you can terminate the new replicas and redeploy the old version. However, like a rollout, a rollback is a gradual, incremental process.
  • Backward compatibility. Because new code and old code live side by side, users might be routed to either one of the versions arbitrarily. Therefore, ensure that the new deployment is backward compatible; that is, the new application version can read and handle data that the old version stores. This data can include data stored on disk, in a database, or as part of a user's browser session.
  • Sticky sessions. If the application requires session persistence, we recommend that the load balancer supports stickiness and connection draining. Also, we recommend that you invoke session-sharing when possible (through session replication or session management using a datastore) so that the sessions can be decoupled from underlying resources.

Blue/green deployment pattern

In a blue/green deployment (also known as a red/black deployment), you perform two identical deployments of your application, as the following diagram shows.

The flow of a blue/green deployment.

In the diagram, blue represents the current application version and green represents the new application version. Only one version is live at a time. Traffic is routed to the blue deployment while the green deployment is created and tested. After you're finished testing, you route traffic to the new version.

After the deployment succeeds, you can either keep the blue deployment for a possible rollback or decommission it. Alternatively, you can deploy a newer version of the application on these instances. In that case, the current (blue) environment serves as the staging area for the next release.

Key benefits

  • Zero downtime. Blue/green deployment allows cutover to happen quickly with no downtime.
  • Instant rollback. You can roll back at any time during the deployment process by adjusting the load balancer to direct traffic back to the blue environment. The impact of downtime is limited to the time it takes to switch traffic to the blue environment after you detect an issue.
  • Environment separation. Blue/green deployment ensures that spinning up a parallel green environment doesn't affect resources that support the blue environment. This separation reduces your deployment risk.

Considerations

  • Cost and operational overhead. Adopting the blue/green deployment pattern can increase operational overhead and cost because you must maintain duplicate environments with identical infrastructure.
  • Backward compatibility. Blue and green deployments can share data points and datastores. We recommend that you verify that both versions of the application can use the schema of the datastore and the format of the records. This backward compatibility is necessary if you want to switch seamlessly between the two versions if you need to roll back.
  • Cutover. If you plan to decommission the current version, we recommend that you allow for appropriate connection draining on existing transactions and sessions. This step allows requests processed by the current deployment to be completed or terminated gracefully.

Testing strategies

The testing patterns discussed in this section are typically used to validate a service's reliability and stability over a reasonable period under a realistic level of concurrency and load.

Canary test pattern

In canary testing, you partially roll out a change and then evaluate its performance against a baseline deployment, as the following diagram shows.

The configuration for a canary test.

In this test pattern, you deploy a new version of your application alongside the production version. You then split and route a percentage of traffic from the production version to the canary version and evaluate the canary's performance.

You select the key metrics for the evaluation when you configure the canary. We recommend that you compare the canary against an equivalent baseline and not the live production environment.

To reduce factors that might affect your analysis (such as caching, long-lived connections, and hash objects), we recommend that you take the following steps for the baseline version of your application:

  • Ensure that the baseline and production versions of your application are identical.
  • Deploy the baseline version at the same time that you deploy the canary.
  • Ensure that the baseline deployment (such as the number of application instances and autoscaling policies) matches the canary deployment.
  • Use the baseline version to serve the same traffic as the canary.

In canary tests, partial rollout can follow various partitioning strategies. For example, if the application has geographically distributed users, you can roll out the new version to a region or a specific location first. For more information, see Automating canary analysis on GKE with Spinnaker and best practices for configuring a canary.

Key benefits

  • Ability to test live production traffic. Instead of testing an application by using simulated traffic in a staging environment, you can run canary tests on live production traffic. With canary rollouts, you need to decide in what increments you release the new application and when you trigger the next step in a release. The canary needs enough traffic so that monitoring can clearly detect any problems.
  • Fast rollback. You can roll back quickly by redirecting the user traffic to the older version of the application.
  • Zero downtime. Canary releases let you route the live production traffic to different versions of the application without any downtime.

Considerations

  • Slow rollout. Each incremental release requires monitoring for a reasonable period and, as a result, might delay the overall release. Canary tests can often take several hours.
  • Observability. A prerequisite to implementing canary tests is the ability to effectively observe and monitor your infrastructure and application stack. Implementing robust monitoring can require a substantial effort.
  • Backward compatibility and sticky sessions. As with rolling updates, canary testing can pose risks with backward incompatibility and session persistence because multiple application versions run in the environment while the canary is deployed.

A/B test pattern

With A/B testing, you test a hypothesis by using variant implementations. A/B testing is used to make business decisions (not only predictions) based on the results derived from data.

When you perform an A/B test, you route a subset of users to new functionality based on routing rules, as the following diagram shows.

The configuration for an A/B test.

Routing rules often include factors such as browser version, user agent, geolocation, and operating system. After you measure and compare the versions, you update the production environment with the version that yielded better results.

Key benefits

A/B testing is best used to measure the effectiveness of functionality in an application. Use cases for the deployment patterns discussed earlier focus on releasing new software safely and rolling back predictably. In A/B testing, you control your target audience for the new features and monitor any statistically significant differences in user behavior.

Considerations

  • Complex setup. A/B tests need a representative sample that can be used to provide evidence that one version is better than the other. You need to pre-calculate the sample size (for example, by using an A/B test sample size calculator) and run the tests for a reasonable period to reach statistical significance of at least 95%.
  • Validity of results. Several factors can skew the test results, including false positives, biased sampling, or external factors (such as seasonality or marketing promotions).
  • Observability. When you run multiple A/B tests on overlapping traffic, the processes of monitoring and troubleshooting can be difficult. For example, if you test product page A versus product page B, or checkout page C versus checkout page D, distributed tracing becomes important to determine metrics such as the traffic split between versions.

Shadow test pattern

Sequential experiment techniques like canary testing can potentially expose customers to an inferior application version during the early stages of the test. You can manage this risk by using offline techniques like simulation. However, offline techniques do not validate the application's improvements because there is no user interaction with the new versions.

With shadow testing, you deploy and run a new version alongside the current version, but in such a way that the new version is hidden from the users, as the following diagram shows.

The configuration for a shadow test.

An incoming request is mirrored and replayed in a test environment. This process can happen either in real time or asynchronously after a copy of the previously captured production traffic is replayed against the newly deployed service.

You need to ensure that the shadow tests do not trigger side effects that can alter the existing production environment or the user state.

Key benefits

  • Zero production impact. Because traffic is duplicated, any bugs in services that are processing shadow data have no impact on production.
  • Testing new backend features by using the production load. When used with tools such as Diffy, traffic shadowing lets you measure the behavior of your service against live production traffic. This ability lets you test for errors, exceptions, performance, and result parity between application versions.
  • Reduced deployment risk. Traffic shadowing is typically combined with other approaches like canary testing. After testing a new feature by using traffic shadowing, you then test the user experience by gradually releasing the feature to an increasing number of users over time. No full rollout occurs until the application meets stability and performance requirements.

Considerations

  • Side effects. With traffic shadowing, you need to be cautious in how you handle services that mutate state or interact with third-party services. For example, if you want to shadow test the payment service for a shopping cart platform, the customers could pay twice for their order. To avoid shadow tests that might result in unwanted mutations or other risk-prone interactions, we recommend that you use either stubs or virtualization tools such as Hoverfly instead of third-party systems or datastores.
  • Cost and operational overhead. Shadow testing is fairly complex to set up. Also, like blue/green deployments, shadow deployments carry cost and operational implications because the setup requires running and managing two environments in parallel.

Choosing the right strategy

You can deploy and release your application in several ways. Each approach has advantages and disadvantages. The best choice comes down to the needs and constraints of your business. Consider the following:

  • What are your most critical considerations? For example, is downtime acceptable? Do costs constrain you? Does your team have the right skills to undertake complex rollout and rollback setups?
  • Do you have tight testing controls in place, or do you want to test the new releases against production traffic to ensure the stability of the release and limit any negative impact?
  • Do you want to test features among a pool of users to cross-verify certain business hypotheses? Can you control whether targeted users accept the update? For example, updates on mobile devices require explicit user action and might require extra permissions.
  • Are microservices in your environment fully autonomous? Or, do you have a hybrid of microservice-style applications working alongside traditional, difficult-to-change applications? For more information, see deployment patterns on hybrid and multi-cloud environments.
  • Does the new release involve any schema changes? If yes, are the schema changes too complex to decouple from the code changes?

The following table summarizes the salient characteristics of the deployment and testing patterns discussed earlier in this document. When you weigh the advantages and disadvantages of various deployment and testing approaches, consider your business needs and technological resources, and then select the option that benefits you the most.

Deployment or
testing pattern
Zero downtime Real production traffic testing Releasing to users based on conditions Rollback duration Impact on hardware and cloud costs
Recreate
Version 1 is terminated, and Version 2 is rolled out.
x x x Fast but disruptive because of downtime No extra setup required
Rolling update
Version 2 is gradually rolled out and replaces Version 1.
x x Slow Can require extra setup for surge upgrades
Blue/green
Version 2 is released alongside Version 1; the traffic is switched to Version 2 after it is tested.
x x Instant Need to maintain blue and green environments simultaneously
Canary
Version 2 is released to a subset of users, followed by a full rollout.
x Fast No extra setup required
A/B
Version 2 is released, under specific conditions, to a subset of users.
Fast No extra setup required
Shadow
Version 2 receives real-world traffic without impacting user requests.
x Does not apply Need to maintain parallel environments in order to capture and replay user requests

Best practices

In order to keep deployment and testing risks to a minimum, application teams can follow several best practices:

  • Backward compatibility. When you run multiple application versions at the same time, ensure that the database is compatible with all active versions. For example, a new release requires a schema change to the database (such as a new column). In such a scenario, you need to change the database schema so that it's backward compatible with the older version. After you complete a full rollout, you can remove support for the old schema, leaving support only for the newest version. One way to achieve backward compatibility is to decouple schema changes from the code changes. For more information, see parallel change and database refactoring patterns.
  • Continuous integration/continuous deployment (CI/CD). CI ensures that code checked into the feature branch merges with its main branch only after it successfully passes dependency checks, unit and integration tests, and the build process. Therefore, every change to an application is tested before it can be deployed. With CD, the CI-built code artifact is packaged and ready to be deployed in one or more environments. For more information, see building a CI/CD pipeline with Google Cloud.
  • Automation. If you continuously deliver application updates to the users, we recommend that you build an automated process that reliably builds, tests, and deploys the software. We also recommend that your code changes automatically flow through a CI/CD pipeline that includes artifact creation, unit testing, functional testing, and production rollout. By using automation tools such as Spinnaker, Jenkins, TravisCI, and Cloud Build, you can automate the deployment processes to be more efficient, reliable, and predictable.
  • Operating environments and configuration management. Tools like Vagrant and Packer can help you maintain consistent local development, staging, and production environments. You can also use configuration management tools like Puppet, Chef, or Ansible to automatically apply OS settings or apply patches in target servers. For more information, see building custom images with Jenkins and Packer for virtual machines and containers.
  • Rollback strategy. Create a rollback strategy to follow in the case that something goes wrong. Release automation tools like Spinnaker and Harness support rollbacks, or you can keep a backup of the last application version until you know the new one works properly.
  • Post-deployment monitoring. An application performance management tool can help your team monitor critical performance metrics. Create a process for alerting the responsible team when a build or deployment fails. Enable automated rollbacks for deployments that fail health checks, whether because of availability or error rate issues.

What's next