Compute

Let Google Cloud’s predictive services autoscale your infrastructure

July 1, 2021

Pawel Wenda

Product Manager

Try Google Cloud

Start building on Google Cloud with $300 in free credits and 20+ always free products.

At Google Cloud, we believe you get most benefits from the cloud when you scale infrastructure based on changing demand. Compute Engine allows you to configure autoscaling to save costs during periods of low demand, and add capacity to support peak loads.

When you use a managed instance group (MIG), you can have an autoscaler automatically create or delete virtual machine (VM) instances based on increases or decreases in load. However, if your application takes several minutes to initialize, creating VMs in response to growing load might not increase your application's capacity quickly enough. For example, if there's a large increase in load (like when users first wake up in the morning), some users might experience delays while your application is initializing on new instances.

A good way to solve this problem would be to create VMs ahead of demand so that your application has enough time to initialize beforehand. This requires knowing upcoming demand. If only we could predict the future… Well, now we can!

Introducing predictive autoscaling

Predictive autoscaling uses Google Cloud's machine learning capabilities to forecast capacity needs. It creates VMs ahead of growing demand allowing enough time for your application to initialize.

https://storage.googleapis.com/gweb-cloudblog-publish/images/Figure_1_Q7lB0jb.max-1300x1300.jpg

Figure 1. Autoscaling creates VMs as demand grows leaving no buffer for application to initialize. Predictive autoscaling creates VMs ahead of demand allowing enough time for your application to initialize and start serving new load.

How does it work?

Predictive autoscaling uses your instance group’s CPU history to forecast future load and calculate how many VMs are needed to meet your target CPU utilization. Our machine learning adjusts the forecast based on recurring load patterns for each MIG.

You can specify how far in advance you want autoscaler to create new VMs by configuring the application initialization period. For example, if your app takes 5 minutes to initialize, autoscaler will create new instances 5 minutes ahead of the anticipated load increase. This allows you to keep your CPU utilization within the target and keep your application responsive even when there’s high growth in demand.

Many of our customers have different capacity needs during different times of the day or different days of the week. Our forecasting model understands weekly and daily patterns to cover for these differences. For example, if your app usually needs less capacity on the weekend our forecast will capture that. Or, if you have higher capacity needs during working hours, we also have you covered.

Why should you try it?

Predictive autoscaling continuously adapts forecasted capacity to best match upcoming demand. Autoscaler checks the forecast several times per minute and creates or deletes VMs to match its prediction. The forecast itself is updated every few minutes to match recent load trends so if your growth rate is higher or lower than usual we will adjust the forecast accordingly. This gives you capacity needed to cover peak load while saving on cost when demand goes down.

You can start using predictive autoscaling without worry as it's fully compatible with the current autoscaler. Autoscaler will calculate enough VMs to cover both forecasted as well as real-time CPU load—whichever is higher. This works with other autoscaling features as well: you can scale based on schedule, your Load Balancer request target or Cloud Monitoring metrics. Autoscaler provides enough capacity to all of your configurations by taking the highest number of VMs needed to meet all your targets.

Getting started

You can enable predictive autoscaling in the Google Cloud Console. Select an autoscaled MIG from the instance groups page and click Edit group. Change predictive autoscaling configuration from Off to Optimize for availability.

https://storage.googleapis.com/gweb-cloudblog-publish/images/compute_google_console.max-1500x1500.jpg

To better understand whether predictive autoscaling is good for your application, click the link See if predictive autoscaling can optimize your availability. This will show you a comparison of the last seven days with your current autoscaling configuration vs. with predictive autoscaling enabled.

https://storage.googleapis.com/gweb-cloudblog-publish/images/instance_group_autoscaling.max-1300x1300.jpg

In the above chart,

Average VM minutes overloaded per day shows how often your VMs exceed your CPU utilization target. This happens when demand is higher than available capacity. Predictive autoscaling can reduce this by starting VMs ahead of anticipated load.
Average VMs per day is a proxy for cost. This shows how much additional VM capacity you need to keep your CPU utilization within the target you have set. You can optimize your cost by adjusting Minimum instances and CPU utilization as explained below.

Optimizing your configuration

Make sure your Cool down period reflects how long it takes for your application to initialize from VM boot time until it's ready to serve the load. Predictive autoscaling will use this value to start VMs ahead of forecasted load. If you set it to 10 minutes (600 seconds) your VMs will start 10 minutes before the load is expected to increase.

Review your autoscaling CPU utilization target and Minimum number of instances. With predictive autoscaling you no longer need a buffer to compensate for the time it takes for a VM to start. If your application works best at 70% CPU utilization you don't need to set target to a much lower value as predictive autoscaling will start VMs ahead of usual load. A higher CPU utilization and lower Minimum number of instances allows you to reduce the cost as you don't need to pay for additional capacity to prepare for growing demand.

Try predictive autoscaling today

Predictive autoscaling is generally available across all Google Cloud regions. For more information on how to configure, simulate and monitor predictive autoscaling, consult the documentation.

Compute