Creating web applications that are both resilient and scalable is an essential part of any application architecture. A well-designed application should scale seamlessly as demand increases and decreases, and be resilient enough to withstand the loss of one or more compute resources.
This document is for systems operations professionals familiar with Compute Engine. In this document, you learn how to use Google Cloud Platform (GCP) to build scalable and resilient application architectures using patterns and practices that apply broadly to any application. You also learn how these principles apply to real-world scenarios through an example deployment of the popular open source project management tool Redmine, a Ruby on Rails–based application. Later—in the section Deploying the example solution—you have the chance to deploy the application yourself and download all of the source for reference.
GCP lets you build and run web applications that are both scalable and resilient. You can use services such as Compute Engine and autoscaler to adjust your application's resources, as demand requires. And with Compute Engine’s pricing model, you pay on a per-second basis, and you automatically receive the best price with sustained use discounts without any complicated capacity or reservation planning.
For a general overview of your options for web hosting on GCP, see Serving websites.
Defining scalability and resilience
Before describing a sample application architecture, it's helpful to define the terms scalability and resilience.
Scalability: adjusting capacity to meet demand
A scalable application is one that works well with 1 user or 1,000,000 users, and gracefully handles peaks and dips in traffic automatically. By adding and removing virtual machines only when needed, scalable apps consume only the resources necessary to meet demand.
The following diagram shows how a scalable application responds to increases and decreases in demand.
Notice that capacity adjusts dynamically to account for changes in demand. This configuration, sometimes referred to as elasticity in design, helps ensure that you're paying only for the compute resources your application needs at a specific moment in time.
Resilience: designed to withstand the unexpected
A highly-available, or resilient, application is one that continues to function despite expected or unexpected failures of components in the system. If a single instance fails or an entire zone experiences a problem, a resilient application remains fault tolerant—continuing to function and repairing itself automatically if necessary. Because stateful information isn’t stored on any single instance, the loss of an instance—or even an entire zone—should not impact the application’s performance.
A truly resilient application requires planning from both a software development level and an application architecture level. This document primarily focuses on the application architecture level.
Designing an application architecture for a resilient application typically involves:
- Load balancers to monitor servers and distribute traffic to servers that can best handle the requests
- Hosting virtual machines in multiple regions
- Configuring a robust storage solution
GCP: flexible and cost-effective
Traditional architectures that support scalability and resilience often require significant investments in resources. With on-premises solutions, scalability often means deciding between over-spending on server capacity to handle peak usage, or purchasing only based on average need, risking poor application performance or user experience when traffic spikes. Resilience is more than just server capacity, however—location is also important. To mitigate the impact of physical events, such as severe storms or earthquakes, you must consider operating servers in different physical locations, which comes at a significant cost.
GCP offers an alternative: a set of cloud services that provide you a flexible way of adding scalability and resilience to your architecture. In addition, GCP provides these services by using a pricing structure that you can control.
Building resilient and scalable architectures with GCP
The following table shows how different GCP services map to the key requirements necessary to make applications scalable and resilient.
|Architecture requirement||GCP service|
|Load balancing||HTTP load balancing|
|Server hosting||Compute Engine, Regions and zones|
|Server management||Instance templates, Managed instance groups, Autoscaler|
|Data storage||Cloud SQL, Cloud Storage|
The following diagram shows how these GCP components work together to build a scalable, resilient application. The role each component plays is described in more detail in the next section.
Overview of components
Each component in the example application architecture plays a role in ensuring the application is both scalable and resilient. This section briefly describes each of these services. Later sections show how each of these services works together.
HTTP load balancer
The HTTP load balancer exposes a single
public IP address that customers use to access the application. This IP address
can be associated with a DNS
A record (for example,
CNAME (for example,
www.example.com). Incoming requests are distributed
across the instance groups in each zone according to each group’s capacity. In
the zone, requests are spread evenly over the instances in the group.
Although the HTTP load balancer can balance traffic across multiple regions,
this example solution uses it in a single region with multiple zones.
A zone is an isolated location within a region. Zones have high-bandwidth, low-latency network connections to other zones in the same region. Google recommends deploying applications across multiple zones in a region.
An instance is a virtual machine hosted on Google’s infrastructure. You can install and configure these instances just like physical servers. In the example deployment, you use startup scripts and Chef to configure instances with the application server and code for the application.
Managed instance group
A managed instance group is a collection of homogeneous instances based on an instance template. A managed instance group can be targeted by an HTTP load balancer to spread work across instances in the group. A managed instance group has a corresponding instance group manager resource, which is responsible for adding and removing instances from the group.
The Compute Engine autoscaler adds or removes Compute Engine instances to a managed instance group by interfacing with the group’s manager in response to traffic, CPU utilization, or other signals. In the example solution, the autoscaler responds to the request-per-second (RPS) metric of the HTTP load balancer. An autoscaler is required for each managed instance group that you want scaled automatically.
Cloud SQL is a fully-managed database service that supports both MySQL and PostgreSQL. Replication, encryption, patches, and backups are managed by Google. A Cloud SQL instance is deployed to a single zone, and data is replicated to other zones automatically. The Redmine application used in this example is compatible with MySQL and works seamlessly with Cloud SQL.
Cloud Storage allows objects (usually files) to be stored and retrieved with a simple and scalable interface. In this solution, a Cloud Storage bucket is used to distribute private SSL keys to the scalable Compute Engine instances, and is also used to store all files uploaded to the Redmine application, meaning no stateful information is stored on any instance’s disks.
For this example architecture to be resilient, it needs to automatically replace instances that have failed or have become unavailable. When a new instance comes online, it should:
- Understand its role in the system.
- Configure itself automatically.
- Discover any of its dependencies.
- Start handling requests automatically.
To replace a failed instance automatically you can use several Compute Engine components together:
A startup script runs when your instance boots up or restarts. You can use these scripts to install software and updates, to ensure that services are running within the virtual machine, or even install a configuration management tool, such as Chef, Puppet, Ansible, or Salt.
This scenario uses a startup script to install Chef Solo, which in turn further configures instances to work with the application. To learn about how you can use startup scripts and Chef Solo to automatically configure instances, see Appendix: Adding a new instance at the end of this topic.
In addition to a startup script, you need to define a few more items before you launch a Compute Engine instance. For example, you need to specify its machine type, the operating system image to use, and any disks that you want to attach. You define these options using an instance template.
Together, an instance template and a startup script define how to launch a Compute Engine instance and how to configure the software on that instance to fulfill a specific role in your application architecture.
Of course, an instance template is just that—a template. To put this template to work, you need a way to apply that template to new Compute Engine instances as they come online. To accomplish this, you create a managed instance group. You determine the number of instances you want running at any given time, and which instance template you want applied to those instances. An instance group manager is then responsible for launching and configuring those instances as needed.
The following diagram shows how these components work together:
- Startup script
- Instance template
- Instance group manager
- Managed instance group
A managed instance group and its corresponding instance group manager can be zone-specific or regional resources. An instance template is a project-level resource that can be reused across multiple managed instance groups in any zone, in any region; however, you might specify some zonal resources in an instance template, which restricts the use of that template to the zone where those zonal resources reside.
With startup scripts, instance templates, and managed instance groups, you now have a system that can replace unhealthy instances with new ones. In the next section, you'll see one way in which you can define what an unhealthy instance is and how to detect it.
At this point, the example application has almost all the tools it needs to be resilient. However, there is one piece missing—it needs a way of identifying instances that are unhealthy so it knows it should replace them.
This application is designed to have users connect to an appropriate, healthy instance by using an HTTP load balancer. This architecture lets you use two services to identify instances that are capable of serving requests:
- Health checks. An HTTP
health check specifies the port and path to execute the health check against
on each instance. The health check expects a
200 OKresponse from a healthy instance.
- Backend services. A
backend service defines one or more instance groups that receive
traffic from a load balancer. The backend service specifies the port and
protocol exposed by the instances, for example HTTP port
80, as well as the HTTP health check to be used against instances in the instance groups.
The following diagram shows the application architecture and how a backend service and HTTP health check relate to the load balancer and instance groups.
Data resilience with Cloud SQL
The three main areas of any application architecture are networking, computing, and storage. The application architecture described here has covered the networking and computing components, but to be complete, it must also address the storage component.
This example solution uses Cloud SQL First Generation instances to provide a fully managed MySQL database. With Cloud SQL, Google manages replication, encryption, patch management, and backups automatically.
A Cloud SQL database is region-wide, which means data is replicated across the zones within a region. This is equivalent to taking a backup of any updates to your data as they happen. In the unlikely event of a complete failure of a zone, data will be preserved.
Cloud SQL lets you choose between two replication types:
- Synchronous replication. With synchronous replication, updates are copied to multiple zones before returning to the client. This is great for reliability and availability in the event of major incidents, but makes writes slower.
- Asynchronous replication. Asynchronous replication increases write throughput by acknowledging writes once they are cached locally, but before copying the data to other locations. Asynchronous replication results in faster writes to the database because you don't have to wait for replication to finish. However, you might lose your latest updates in the unlikely event of a data center system failure within a few seconds of updating the database.
The Redmine application used in this solution uses synchronous replication because the workload isn't write intensive. You choose between synchronous and asynchronous replication depending on the write-performance and data durability requirements of your application.
The previous sections have shown how the example application uses GCP to create a resilient application. But resiliency isn't enough—scalability is also important. The application should work well for 1 user or 1,000,000 users, and its resources should increase or decrease with those users to be cost effective.
The idea that the application's resources can increase or decrease requires that it has:
- A means by which you can add or remove instances from service. You also need a way of deciding when an instance needs to be added, and when one should be removed. GCP's autoscaler solves this issue.
- A means of storing stateful data. Because instances can come and go, it isn't advisable to store stateful data on those instances. The application architecture solves this for the relational data by storing it in a separate Cloud SQL instance, but it also needs to account for user-uploaded files. Cloud Storage fulfills this requirement.
The following sections describe how to use the autoscaler to scale the infrastructure running the Redmine application and how to leverage Cloud Storage for uploaded files.
Scale with autoscaler
As use of the application ebbs and flows, it needs to dynamically adjust the resources it requires. You can solve this challenge with an Compute Engine autoscaler.
When traffic or load rises, the autoscaler adds resources to handle the extra activity and removes resources when the traffic or load lowers to help you reduce costs. Autoscaler performs these actions automatically, based on the scaling rules you define and without subsequent intervention on your part.
The impact of the autoscaler is twofold:
- Your users get a great experience using your application because there are always enough resources to meet demand.
- You maintain better control over your costs because the autoscaler removes instances when demand falls below a specified threshold.
The autoscaler can scale the number of virtual machines based on CPU utilization, serving capacity, or a Stackdriver Monitoring metric. This solution uses the serving capacity metric to add or remove Compute Engine instances based on the number of requests per second (RPS) the instances are receiving from the load balancer. See Batch Processing with Compute Engine autoscaler to learn more.
Requests per second (RPS)
Previous sections described a single backend service that
identifies the instance groups to receive traffic from the load balancer.
For each of the instance groups associated with the backend service, this
example solution also sets
balancingMode=RATE. This property instructs
the load balancer to balance based on the RPS defined in the
maxRatePerInstance property, which is set to
100 for this example. This
configuration means the load balancer attempts to keep each instance at or below
100 RPS. To learn more about the configuration properties of a backend service,
see the documentation for backend services.
To scale on RPS, you need to create an autoscaler for each instance group that you want scaled automatically. In this example, the instance group is a per-zone resource, so you need to create an autoscaler in each zone.
Each autoscaler includes a
utilizationTarget property that defines the fraction
of the load balancer's maximum serving capacity that the autoscaler maintains.
This example sets each autoscaler’s
utilizationTarget to 80% of the backend
service’s maximum rate of 100 RPS for each instance. This means the autoscaler
scales once the RPS exceeds 80% of the maximum rate per instance, which is 80
RPS. The autoscaler scales down when RPS drops below that threshold.
Each autoscaler also defines a minimum and maximum number of instances that the Autoscaler can't breach.
Handle file uploads
Part of the Redmine application’s functionality includes letting users upload and save files when logged in. The default behavior of Redmine and many other similar applications is to store those files directly on local disk. This approach is fine if you have just one server with a well-defined backup mechanism. However, this isn't the optimal approach when you have multiple, automatically scaled Compute Engine instances behind a load balancer. If a user uploads a file, there’s no guarantee that the next request will land on the machine where the files are stored. There's also no guarantee the autoscaler won't terminate an unneeded instance that has files on it.
A better solution is to use Cloud Storage, which provides a centralized location perfect for storing and accessing file uploads from an automatically scaled fleet of web servers. Cloud Storage also exposes an API that is interoperable with Amazon S3 clients, making it compatible with existing application plugins for Amazon S3, including the Redmine S3 plugin, without any modifications. Many third-party and open source applications have plugins to support object stores like Cloud Storage. If you’re building your own application, you can use the Cloud Storage API directly to support storing files.
The following diagram shows the flow for uploading (blue arrows) and retrieving (green arrows) files using Redmine and Cloud Storage:
The process shown in the diagram is as follows:
- The user POSTs the file from a web browser.
- The load balancer chooses an instance to handle the request.
- The instance stores the file in Cloud Storage.
- The instance stores file metadata, such as the name, owner, and its location in Cloud Storage, in the Cloud SQL database.
- When a user requests a file, the file is streamed from Cloud Storage to an instance.
- The instances sends the stream through the load balancer.
- The file is sent to the user.
In addition to removing stateful file uploads from Compute Engine instances and letting them scale dynamically, Cloud Storage provides redundant, durable storage for a virtually infinite number of file uploads. This storage solution is resilient, scalable, and cost-effective—you pay only for the storage that you use without worrying about capacity planning, and data is automatically stored redundantly across multiple zones.
So far, the application architecture described in this document shows how to build a resilient and scalable application using GCP. However, it's not enough to be able to build an app—you need to be able to build it in a way that is as cost-effective as possible.
This section demonstrates how the application architecture described in this document isn't only resilient and scalable, but also highly cost effective. It starts by making some general assumptions about how heavily and frequently the application is used, and then convert those assumptions into a basic cost estimate. Keep in mind that these assumptions are just that—assumptions. Feel free to adjust these numbers as necessary to create a cost estimate that more closely matches the anticipated usage of your own applications.
A primary concern for any application architecture is how much it costs to keep the servers running. This cost analysis uses the following assumptions:
|Average page views per month||20,000,000|
|Average HTTP requests per month||120,000,000|
|Peak hours (90% or greater) of usage||7:00am to 6:00pm Monday through Friday|
|Data transfer per page view||100KB|
|Peak hours per month||220|
|Request rate during peak hours||127 request/second (RPS)|
|Request rate during off-peak hours||6 request/second (RPS)|
Based on these assumptions, you can figure out how many page views the application receives during the peak hours of 7:00am to 6:00pm Monday through Friday each month:
120,000,000 (requests each month) * 90% (occurring at peak hours) = 108,000,000 (requests at peak hours each month)
On average, there are 22 work days each month. If each workday has 11 peak hours in it, then you need to provide enough compute resources to handle 242 peak hours each month.
Next, you need to figure out what type of Compute Engine instance can
handle this type of traffic. This application architecture was tested using
for basic load testing. The results of these
tests determined that 4 Compute Engine instances of type
n1-standard-1 would be sufficient.
For non-peak hours, this solution has a minimum of two
To see how much it costs to run these instances check out the latest price estimates on the GCP Pricing Calculator. When you do, notice that, in both cases, these instances automatically qualify for sustained use discounts.
Load balancing and data transfer
This application provisioned a load balancer with a single forwarding rule, which is the public IP address that users connect to. That forwarding rule is billed on an hourly basis.
For data transfer estimates, consider a worst case scenario first. The load
balancer charges for ingress data processed by the load balancer, and normal
egress rates are charged for traffic outbound from the load balancer. Assuming
that 99.5% of the 120,000,000 HTTP requests are users loading a
Redmine project page. Loading a page counts for 1
HTTP GET request, which
then causes 5 more
HTTP GET requests to load
other assets (CSS, images, and jQuery). Loading an entire page involves 6 HTTP
requests. This results in:
- Approximately 20,000,000 complete page loads per month
- About 10 KB of ingress data processed and 450 KB of data transfer per page
- An approximate total of 214 GB of data processed by the load balancer each month and 9,091 GB of egress traffic
The other 0.5% of the 20,000,000 HTTP requests are
HTTP POST requests to
upload a file of average size (about .5 MB) for an additional 500 GB
of data processed each month.
This GCP Pricing Calculator estimate shows the anticipated cost for the 714 GB of data transfer the load balancer would handle more than 9,091 GB egress traffic for this scenario.
That data transfer estimate was a worst case scenario because it's serving all of the content—including static assets—on each request from a Compute Engine instance and through a load balancer, without the benefit of caching or a content delivery network (CDN). Of the roughly 450 KB payload for each page load—and recall this solution is based on over 20M page loads each month—333 KB of that is required to load jQuery. By simply updating one line of the application to load jQuery from Google hosted libraries, you reduce data transfer costs by 73%.
This updated price estimate shows the savings in data transfer achieved by switching to the Google hosted libraries.
This solution uses Cloud Storage for all files uploaded through the
Redmine application. As described in the previous section, about 0.5% of this
usage is to upload files, with each file averaging about .5 MB in size. This
means you can expect to see 1,000,000 new file uploads each month, resulting in
500 GB of new storage each month. This solution also assumes 1,000,000
operations each month to store new files, which is charged as a Class A
This price estimate from the GCP Pricing Calculator shows the anticipated cost for using Cloud Storage.
This architecture uses Cloud SQL to store all relational data for the application. Based on the example metrics described earlier, the D2 database type with 1024 MB RAM should provide sufficient capacity for the application workload, and will be running 24 hours a day, 7 days a week. Because this database will likely see heavy usage, choose the Heavy option for I/O operations in the calculator. A test for this example architecture was made by inserting 100,000 documents, the results of which determined that a 50 GB disk will support over 100,000,000 documents, allowing the database to support more than 8 years of use at the described rate.
Here is an estimate from the GCP Pricing Calculator that shows the anticipated cost for this part of the architecture costs.
Deploying the example solution
To deploy the example application described in this solution, visit the GitHub repository Scalable and resilient applications on GCP.
Appendix: Adding a new instance
As part of your efforts to create a resilient and scalable application architecture, you need to decide on how you want to add new instances. Specifically, you need to determine how to automatically configure new instances as they come online.
In this section, you look at a few of the available options.
Bootstrapping software installation
To serve a user’s web request each instance needs some additional software installed on top of the base operating system, along with configuration data. The configuration data includes the database connection information and the name of the Cloud Storage bucket that files are stored in. If you imagine these components as layers, you can visualize the entire stack that runs on each instance:
This solution uses an instance template, which specifies the Compute Engine image that instances use when launched. Specifically, this solution uses the Ubuntu 14.10 image developed and supported by Canonical. Because this is the base operating system image, it doesn’t include any of the software or configuration required by the application.
To get the rest of the stack automatically installed when a new instance comes
up, you can use a combination of
Compute Engine startup-scripts
and Chef Solo
to bootstrap at launch time.
You can specify a startup script by adding a
startup-script metadata attribute
item to an instance template. A startup script runs when the instance boots up.
This startup script:
- Installs the Chef client.
- Downloads a special Chef file called
node.json. This file tells Chef which configuration to run for this instance.
- Runs Chef and lets it take care of the detailed configuration.
Here is the startup script in its entirety:
#! /bin/bash # Install Chef curl -L https://www.opscode.com/chef/install.sh | bash # Download node.json (runlist) curl -L https://github.com/googlecloudplatform/... > /tmp/node.json # Run Chef chef-solo -j /tmp/node.json -r https://github.com/googlecloudplatform/...
Providing application configuration
After a new instance boots and configures itself using the startup script and Chef, it needs to know some information before it can begin servicing requests. In this example, each instance needs to know database connection information, such as the hostname, username, and password, as well as the name of the Cloud Storage bucket to use and credentials to connect with.
Every Compute Engine instance has metadata attributes associated
with it that you can define. Earlier, you learned about adding the
startup-script metadata attribute, but you can also add arbitrary
key-value pairs. Here you can specify attributes in the instance template to
include the configuration data the instances need to connect to the database
and Cloud Storage bucket.
Here’s what the metadata for an instance template looks like from the GCP Console:
Chef uses a tool called
to parse these
bits of configuration information from the instance’s metadata, populating
templates to create the configuration files the application needs. Here’s the
template that creates the
database.yaml file containing the database
connection info, accessing the appropriate metadata items automatically:
production: adapter: mysql2 database: <%= node['gce']['instance']['attributes']['dbname'] %> host: <%= node['gce']['instance']['attributes']['dbhost'] %> username: <%= node['gce']['instance']['attributes']['dbuser'] %> password: <%= node['gce']['instance']['attributes']['dbpassword'] %>
You can also manually access metadata values from within an instance by using the
local metadata service. Here you can use
curl to retrieve the database
curl "http:/metadata.google.internal/computeMetadata/v1/instance/attributes/dbpassword" -H "Metadata-Flavor: Google"
Performance and dependency considerations
The bootstrapping approach taken in this solution involves starting with a default operating system image, installing all software at launch time with Chef, and using instance metadata to provide app config data.
An advantage of this approach is that the system’s configuration is specified in a Chef cookbook. The cookbook can be version controlled, shared, and used to provision virtual machines locally for testing, using Vagrant or Docker, or to configure servers in your data center or with different cloud providers. Image management is also simplified: in this case of this example application, you only need to track the one base OS image the application uses.
Some disadvantages to consider include potentially slow launch times, as all software is being downloaded and installed—in some cases requiring compilation. It is also important to consider dependencies this method introduces: in this example, Chef installed a number of packages from apt, Rubygems, and GitHub. If any of those repositories are unavailable while a new instance is starting up, its configuration will fail.
Custom images and bootstrapping
Because you can create your own custom images with Compute Engine, installing everything at launch time isn’t the only approach to bootstrapping. For example, you might:
- Launch a base Ubuntu 14.10 image.
- Install everything except the Redmine app (Ruby, nginx, and so on).
- Create an image from the result.
- Use that image in the instance template.
Now when a new instance is launched, it only needs to install Redmine. Boot time is improved and you've reduced the number of external package dependencies.
You could take the custom image approach even further and bake absolutely everything into an image, including all dependencies, application source, and configuration. This has the advantage of fastest boot time and zero external dependencies, but now if anything at all changes in your application, you have to create a new image and update the instance template.
Consider the approaches to bootstrapping an instance as being on a continuum. More configuration at launch time means slower boot times but fewer images to manage. More configuration baked into a custom image means faster boot times and fewer dependencies, but potentially many more images to manage. For most customers, the right approach is a compromise somewhere in the middle. Choose what makes sense for you and your application.