One year of Cloud Performance Atlas
In March of this year, we kicked off a new content initiative called Cloud Performance Atlas, where we highlight best practices for GCP performance, and how to solve the most common performance issues that cloud developers come across.
Here’s the top topics from 2017 that developers found most useful.
5. The bandwidth delay problemEvery now and again, I’ll get a question from a company who recently updated their connection bandwidth from their on-premises systems to Google Cloud, and for some reason, aren’t getting any better performance as a result. The issue, as we’ve seen multiple times, usually resides in an area of TCP called “the bandwidth delay problem.”
The TCP algorithm works by transferring data in packets between two connections. A packet is sent to a connection, and then an acknowledgement packet is returned. To get maximum performance in this process, the connection between the two endpoints has to be optimized so that neither the sender or receiver is waiting around for acknowledgements from prior packets.
The most common way to address this problem is to adjust the window sizes for the packets to match the bandwidth of the connection. This allows both sides to continue sending data until an ACK arrives back from the client for an earlier packet, thereby creating no gaps and achieving maximum throughput. As such, a low window size will limit your connection throughput, regardless of the available or advertised bandwidth between instances.
4. Improving CDN performance with custom keysGoogle Cloud boasts an extremely powerful CDN that can leverage points-of-presence around the globe to get your data to users as fast as possible.
When setting up Cloud CDN for your site, one of the most important things is to ensure that you’re using the right Custom Cache Keys to configure what assets get cached, and which ones don’t. In most cases, this isn’t an issue, but if you’re leveraging a large site with content re-used across protocols (i.e., http and https) you can run into a problem where your cache fill costs can increase more than expected.
3. Google Cloud Storage and the sequential filename challengeGoogle Cloud Storage
https://cloud.google.com/storage/is a one-stop-shop for all your content serving needs. However, one developer continued to run into a problem of slow upload speeds when pushing their content into the cloud.
The issue was that Cloud Storage uses the file path and name of the files being uploaded to segment and shard the connection to multiple frontends (improving performance). As we found out, if those file names are sequential then you could end up in a situation where multiple connections get squashed down to a single upload thread (thus hurting performance)!
2. Improving Compute Engine boot time with custom imagesAny cloud-based service needs to grow and shrink its resource allocations to respond to traffic load. Most of the time, this is a good thing, especially during the holiday season. ;) As traffic increases to your service/application, your backends will need to spin up more Compute Engine VMs to provide a consistent experience to your users.
However, if it takes too long for your VMs to start up, then the quality and performance for you users can be negatively impacted, especially if your VM needs to do a lot of things during its startup script, like compile code, or install large packages.
As we showed in the video, (article) you can pre-compute a lot of that work into a custom image of boot disks. When your VMs are loaded, they simply need to copy in the custom image to the disk (with everything already installed), rather than doing everything from scratch.
If you’re looking to improve your GCE boot performance, custom images are worth checking out!
Before execution can begin, any global data, functions or state information are also set up. Most of the time, these systems are global in scope, since they need to be used by so many subsystems (for example, a logging system).
In the case of App Engine, this global initialization work can end up delaying start-time, since it must complete before a request can be serviced. And as we showed in the video and article, as your application responds to spikes in workload, this type of global variable contention can put a hurt on your request response times.
See you soon!For the rest of 2017 our Cloud Performance team is enjoying a few hot cups of tea, relaxing with the holidays and counting down the days until the new year. In 2018, we’ve got a lot of awesome new topics to cover, including increased networking performance, Cloud Functions and Cloud Spanner!
Thanks again for a great year everyone, and remember, every millisecond counts!