Google Cloud Platform

One year of Cloud Performance Atlas

In March of this year, we kicked off a new content initiative called Cloud Performance Atlas, where we highlight best practices for GCP performance, and how to solve the most common performance issues that cloud developers come across.

Here’s the top topics from 2017 that developers found most useful.

5. The bandwidth delay problem

Every now and again, I’ll get a question from a company who recently updated their connection bandwidth from their on-premises systems to Google Cloud, and for some reason, aren’t getting any better performance as a result. The issue, as we’ve seen multiple times, usually resides in an area of TCP called “the bandwidth delay problem.”

The TCP algorithm works by transferring data in packets between two connections. A packet is sent to a connection, and then an acknowledgement packet is returned. To get maximum performance in this process, the connection between the two endpoints has to be optimized so that neither the sender or receiver is waiting around for acknowledgements from prior packets.

The most common way to address this problem is to adjust the window sizes for the packets to match the bandwidth of the connection. This allows both sides to continue sending data until an ACK arrives back from the client for an earlier packet, thereby creating no gaps and achieving maximum throughput. As such, a low window size will limit your connection throughput, regardless of the available or advertised bandwidth between instances.

Find out more by checking out the video, or article!

4. Improving CDN performance with custom keys

Google Cloud boasts an extremely powerful CDN that can leverage points-of-presence around the globe to get your data to users as fast as possible.

When setting up Cloud CDN for your site, one of the most important things is to ensure that you’re using the right Custom Cache Keys to configure what assets get cached, and which ones don’t. In most cases, this isn’t an issue, but if you’re leveraging a large site with content re-used across protocols (i.e., http and https) you can run into a problem where your cache fill costs can increase more than expected.

You can see how we helped a sports website get their CDN keys just right in the video, and article.

3. Google Cloud Storage and the sequential filename challenge

Google Cloud Storage

https://cloud.google.com/storage/

is a one-stop-shop for all your content serving needs. However, one developer continued to run into a problem of slow upload speeds when pushing their content into the cloud.

The issue was that Cloud Storage uses the file path and name of the files being uploaded to segment and shard the connection to multiple frontends (improving performance). As we found out, if those file names are sequential then you could end up in a situation where multiple connections get squashed down to a single upload thread (thus hurting performance)!

As shown in the video and article, we were able to help a nursery camera company get past this issue with a few small fixes.

2. Improving Compute Engine boot time with custom images

Any cloud-based service needs to grow and shrink its resource allocations to respond to traffic load. Most of the time, this is a good thing, especially during the holiday season. ;) As traffic increases to your service/application, your backends will need to spin up more Compute Engine VMs to provide a consistent experience to your users.

However, if it takes too long for your VMs to start up, then the quality and performance for you users can be negatively impacted, especially if your VM needs to do a lot of things during its startup script, like compile code, or install large packages.

As we showed in the video, (article) you can pre-compute a lot of that work into a custom image of boot disks. When your VMs are loaded, they simply need to copy in the custom image to the disk (with everything already installed), rather than doing everything from scratch.

If you’re looking to improve your GCE boot performance, custom images are worth checking out!

1. App Engine boot time

Modern managed languages (Java, Python, Javascript, etc.) typically have a run-time dependencies step that occurs at the init phase of the program when code is imported and instantiated.

Before execution can begin, any global data, functions or state information are also set up. Most of the time, these systems are global in scope, since they need to be used by so many subsystems (for example, a logging system).

In the case of App Engine, this global initialization work can end up delaying start-time, since it must complete before a request can be serviced. And as we showed in the video and article, as your application responds to spikes in workload, this type of global variable contention can put a hurt on your request response times.

See you soon!

For the rest of 2017 our Cloud Performance team is enjoying a few hot cups of tea, relaxing with the holidays and counting down the days until the new year. In 2018, we’ve got a lot of awesome new topics to cover, including increased networking performance, Cloud Functions and Cloud Spanner!

Until then, make sure you check out the Cloud Performance Atlas videos on Youtube or our article series on Medium.

Thanks again for a great year everyone, and remember, every millisecond counts!