Storage & Data Transfer

Best practices for Cloud Storage cost optimization

GCP_cloudstorage.jpg

Whether you’re part of a multi-billion dollar conglomerate trying to review sales from H1, or you’re just trying to upload a video of your cat playing the piano, you need somewhere for that data to reside. 

We often hear from our customers that they’re using Cloud Storage, the unified object store from Google Cloud Platform (GCP), as a medium for this type of general storage. Its robust API integration with our multitude of services makes it a natural starting point to outline some of our pro tips, based on what we as Technical Account Managers (TAMs) have seen in the field, working side by side with our customers. Part of our responsibility is to offer direction to our customers on making decisions that can reduce costs and help get the most out of their GCP investments. 

While storing an object in the cloud in itself is an easy task, making sure you have the soundest approach for the situation you are in requires a bit more forethought. One of the benefits of having a scalable, limitless storage service is that, much like an infinitely scalable attic in your house, there are going to be some boxes and items (or buckets and objects) that you really can’t justify holding onto. These items incur a cost over time, and whether you need them for business purposes or are just holding onto them on the off chance that they might someday be useful (like those wooden nunchucks you love), the first step is creating a practice around how to identify the usefulness of an object/bucket to your business. So let’s get the broom and dustpan, and get to work!

Cleaning up your storage when you’re moving to cloud

There are multiple factors to consider when looking into cost optimization. The trick here is to ensure that there are no performance impacts and that we aren’t throwing out anything that may need to be retained for future purposes, whether that be compliance, legal, or simply business value purposes. With data emerging as a top business commodity, you’ll want to use appropriate storage classes in the near term as well as for longitudinal analysis. There are a multitude of storage classes to choose from, all with varying costs, durability, and resiliency. 

There are rarely one-size-fits-all approaches to anything when it comes to cloud architecture. However, there are some recurring themes we have noticed as we work alongside our customers. These lessons learned can apply to any environment, whether you’re storing images or building advanced machine learning models.

The natural starting point is to first understand “What costs me money?” when using Cloud Storage. The pricing page is incredibly useful, but we’ll get into more detail in this post. When analyzing customer Cloud Storage use, we consider these needs:

  1. Performance

  2. Retention

  3. Access patterns

There can be many additional use cases with cost implications, but we’ll focus on recommendations around these themes. Here are more details on each.

Retention considerations and tips

The first thing to consider when looking at a data type is its retention period. Asking yourself questions like “Why is this object valuable?” and “For how long will this be valuable?” are critical to help determine the appropriate lifecycle policy. Setting a lifecycle policy lets you tag specific objects or buckets and creates an automatic rule that will delete or even transform storage classes for that particular object or bucket type. Think of this as your own personal butler that will systematically ensure that your attic is organized and clean—except instead of costing money, this butler is saving you money for these operations. 

We see customers use lifecycle policies in a multitude of ways with great success. A great application is for compliance in legal discovery. Depending on your industry and data type, there are certain laws that regulate the data type that needs to be retained and the period for which it must be retained. Using a Cloud Storage lifecycle policy, you can instantly tag an object for deletion once it has met the minimum threshold for legal compliance needs, ensuring you aren’t charged for retaining it longer than is needed and you don’t have to remember which data expires when. To make this simpler, Cloud Storage has a bucket lock feature to minimize the opportunity for accidental deletion. If you’re concerned with FINRA, SEC, and CFTC, this is a particularly useful feature. Bucket lock may also help you address certain healthcare industry retention regulations.

Within Cloud Storage, you can also set policies to transform a storage type to a different class. This is particularly useful for data that will be accessed relatively frequently for a short period of time, but then won’t be needed for frequent access in the long term. You might want to retain these particular objects for a longer period of time for legal or security purposes, or even general long-term business value. A great way to put this in practice is within a lab environment. Once you complete an experiment, you likely want to analyze the results quite a bit in the near term, but in the long term won’t access that data very frequently. Having a policy set up to convert this storage to Nearline or Coldline storage classes after a month is a great way to save on its long-term data costs.

Access pattern considerations and tips

The ability to transform objects into lower-cost storage classes is a powerful tool, but one that must be used with caution. While long-term storage is cheaper to maintain for an object that is accessed at a lower frequency, there will be additional charges incurred if you suddenly need to frequently access the data or metadata that has been moved to a “colder” storage option. There are also cost implications when looking to remove that data from a particular storage class. For instance, there’s currently a minimum time of 30 days for an object to sit in Nearline storage. If you need to access that data with an increased frequency, you can make a copy in a regional storage class instead to avoid increased access charges. 

When considering the opportunities for cost savings in the long term, you should also think about whether your data will need to be accessed in the long term and how frequently it will need to be accessed if it does become valuable again. For example, if you are a CFO looking at a quarterly report on cloud expenses and only need to pull that information every three months, you might not need to worry about the increased charges accrued for the retrieval of that data, because it will still be cheaper than maintaining the storage in a regional bucket year round. Some retrieval costs on longer-term storage classes can be substantial and should be carefully reviewed when making storage class decisions. See the pricing page for the relative differences in cost.

Performance considerations and tips

“Where is this data going to be accessed from?” is a major question to consider when you’re considering performance and trying to establish the best storage class for your particular use case. Locality can directly influence how fast content is pushed to and retrieved from your selected storage location. For instance, a “hot object” with global utilization (such as a database that is accessed frequently, like your employee time-tracking application) would fit well in a multi-regional location, which enables an object to be stored in multiple locations. This can potentially bring the content closer to your end users as well as enhance your overall availability. Another example is a gaming application with a broad geo-distribution of users. This brings the content closer to the user for a better experience (less lag) and ensures that your last saved file is distributed across several locations, so you don’t lose your hard-earned loot in the event of a regional outage.

One thing to keep in mind when considering this option is that storage in multi-regional locations allow for better performance and higher availability, but comes at a premium and could increase network egress charges, depending on your application’s design. During the application design phase, this is an important factor to consider. Another option when you’re thinking about performance is buckets in regional locations, a good choice if your region is relatively close to your end users. You can select a specific region that your data will reside in, and get guaranteed redundancy within that region. This location type is typically a safe bet when you have a team working in a particular area and accessing a dataset with relatively high frequency. This is the most commonly used storage location type that we see, as it handles most workloads’ needs quite well. It's fast to access, redundant within the region, and affordable overall as an object store. 

Overall, for something as simple-sounding as a bucket, there are actually vast amounts of possibility, all with varying degrees of cost and performance implications. As you can see, there are many ways to fine-tune your own company’s storage needs to help save some space and some cash in a well thought-out, automated way. GCP provides many features to help ensure you are getting the most out of your GCP investment, with plenty more coming soon. Find more in these Next ‘19 sessions about optimizing your GCP costs.