Managing Cloud Storage soft delete at scale
Geoffrey Noer
Group Product Manager
Michael Roth
Software Engineer
Have you ever accidentally deleted your data? Unfortunately, many of us have, which is why most operating systems on personal computers have a recycle bin / trash can where you can go to get your files back. On the enterprise side, these accidental deletions can be at a much larger scale – sometimes involving millions or even billions of objects. There is also the prospect of someone gaining unauthorized access to your data and either performing a ransomware attack to try to hold your data hostage or simply deleting it!
We recently launched soft delete for Cloud Storage, an important new data protection feature compatible with all existing Cloud Storage features and workloads. It offers improved protection against accidental and malicious data deletion by providing you with a way to retain and restore recently deleted data at enterprise scale. With soft delete in place, you may also find that your organization can move more quickly when “pruning” old data, knowing that soft delete provides an undo mechanism in case of any mistakes.
In this blog, we provide you with the tools and insights you need to optimize your soft delete settings, even at scale, so that you use soft delete to protect your data based on its business criticality.
How does soft delete work and how is it billed?
When soft delete is enabled, deleted objects are retained in a hidden soft-deleted state for the soft delete retention duration set on that bucket, instead of being permanently deleted. If you need any of the soft-deleted objects back, simply run a restore and they are copied back to live state.
We introduced soft delete with a seven-day retention duration enabled on all existing buckets and as the default for newly created buckets. Soft delete is on by default because accidental deletion events are unfortunately all too common and much of the data stored in Cloud Storage is business-critical in nature. In addition to the seven-day default, you can select any number of days between 7 and 90, or you can disable the feature entirely.
Soft delete usage is billed based on the storage class of the recently deleted objects. In many cases, this only increases bills by a few percentage points, which hopefully represents a good value for the amount of protection that soft delete provides. However, enabling soft delete on buckets that contain a large amount of short-lived (frequently deleted) data can result in large billing increases, since an object deleted after an hour would be billed for the one hour the object was live, plus seven days of soft delete usage.
How valuable is your data?
In order to get to a state where soft delete protects you from data deletion risks that have the lowest economical impact, we recommend that you ask yourself the following three questions:
-
How important is my organization’s data? Are we storing temporary objects or media transcodes that could be easily regenerated if they were lost? Soft delete protection is unlikely to be worth it in these cases. Or are we storing data that would put my business and/or customer relationships at risk if it were lost? Soft delete could provide a vital level of protection here.
-
What level of data protection do we already have? If Cloud Storage has the only copy of your business-critical data, then soft delete protection would be much more important than if you were storing long-term backups of all your data in another Google Cloud region, on-prem, or with another cloud provider.
-
How much data protection can we afford? Soft delete can be much less expensive than performing traditional enterprise backups, but can still have a significant impact on billing, depending mostly on your deletion rates. We recommend considering the cost of soft delete relative to your overall Google Cloud bill rather than only storage because it is protecting your business data relied on by your overall workloads. You may find that leaving soft delete enabled on all your buckets only has a single digit percentage impact on your cloud bill, which may be worth it given the protection it provides against both accidental and malicious deletion events.
Once you have a good idea as to where and how much you want to use soft delete, the next steps depend on your architectural choices and the overall complexity of your organization’s cloud presence. For the rest of this blog, we’ll cover how to assess soft delete’s impact and act on that information, starting with bucket-level metrics, then acting on bucket-level settings within a project, using Terraform for management, and concluding with organizational-level management approaches.
Assessing bucket-level impacts
Many people will find that the easiest way to assess and act on soft delete settings is to use the soft delete impact wizard in the console. This feature allows you to see your project’s top 100 buckets with the highest billing impact and disable soft delete on any buckets that you select.
You can also estimate bucket-level soft delete costs using Cloud Monitoring metrics and visualize them using the Metrics Explorer. You might want to inspect a handful of buckets that are representative of different kinds of datasets to get a better idea of which ones are more and less expensive to protect with soft delete.
Storage metrics
Recently, we introduced new storage metrics that allow you to break down the object counts, bytes, and byte seconds by storage class, and then further by live vs. noncurrent vs. soft-deleted vs. multipart. These breakdowns can be extremely useful even beyond any soft delete analysis you may want to perform. In addition, you can now inspect the deletion rate using the new deleted_bytes metric:
The storage/v2/deleted_bytes metric is a delta count of deleted bytes per bucket, grouped by storage class. It can be used to estimate soft delete billing impact, even if soft delete is disabled or set to a different retention duration than the one being considered.
For example, the absolute cost of soft delete can be calculated as follows: Soft delete retention duration × deleted bytes × storage price. For example, the cost (assuming us-central1 and Standard storage) of enabling a 7-day soft delete policy with 100,000 GB of deletions during the course of a month is (7 / 30.4375 days) × 100,000 GB × $0.02/GB mo = $459.96 (where 30.4375 is the average number of days per month).
The relative cost of soft delete can also be calculated by comparing the storage/v2/deleted_bytes metric to the existing storage/v2/total_byte_seconds metric: soft delete retention duration × deleted bytes / total bytes. Continuing from the above example and given 1,000,000 GB-months of storage for the month, the relative cost of enabling soft delete in this case is: (7 / 30.4375 days) × 100,000 GB / 1,000,000 GB = 2.3% impact.
Metrics Explorer
You can use the Metrics Explorer to create charts that visualize estimated soft delete costs for a given bucket:
-
In the navigation panel of the Google Cloud console, select Monitoring, and then select Metrics explorer (Go to Metrics explorer).
-
Verify that MQL is selected in the Language toggle.
-
Enter the following query into the query editor:
Note: This query assumes a 7-day (604,800 seconds) soft delete window.
Taking action within a project
If you are a storage administrator making decisions about soft delete settings within a project, you may want to go over your list of buckets manually and make decisions based on your business knowledge of what should be protected versus what can go without soft delete. For a larger number of buckets, you might choose to use the above metrics to generate a list of buckets that exceed a billing impact threshold (e.g. 20% impact) on all your buckets and then disable soft delete on those buckets.
As mentioned earlier, the soft delete impact wizard is a console feature that lists your project’s top 100 buckets with the highest billing impact and lets you disable soft delete on the buckets that you select.
Alternatively, we published a soft delete billing impact Python script on Github that generates a list of buckets in a project that exceed the percentage of billing impact that you specify, factoring in the storage classes of objects inside a bucket. The script can also be used to update the soft delete policies based on a specified relative cost threshold.
We recommend you use the Google Cloud CLI to configure soft delete settings on one or more buckets within a project. After installing and signing in, the following gcloud storage commands are examples of actions you may want to take to enable, update, or disable soft delete policies within a specified project:
You may need to quote the argument because some shells can attempt to expand wildcards before passing the arguments to the gcloud CLI. See https://cloud.google.com/storage/docs/wildcards#surprising-behavior.
Taking action with Terraform
If you use an orchestration layer like Terraform, adapting to soft delete should be as simple as updating your templates and deciding on the soft delete retention duration for each workload. This could also involve creating new templates dedicated to short-lived data so that soft delete is disabled for buckets created from those templates. Once you’ve defined your settings, Terraform can update existing buckets to conform to the templates, and new buckets should be created with your intended settings.
With Terraform, the primary thing you need to do is to update your template(s) to include a soft delete policy. Here is an example of setting the soft delete retention duration to seven days (604800 seconds) in a google_storage_bucket
resource:
To disable soft delete instead, simply set retention_duration_seconds = 0
.
For more information, please also see: Use Terraform to create storage buckets and upload objects.
Taking action across a large organization
If you work for a large enterprise with thousands of projects and millions of buckets and mostly don’t use an orchestration layer, then a manual approach is not realistic, and you will need to make decisions at scale. If this is your situation, we recommend that you first learn about the bucket-level metrics and how to take action within a project as described earlier. In this section, we’ll extend these techniques to the organization level. Again, we assume you have already installed an up-to-date version of the gcloud CLI which you will need for this section.
To implement a policy across even the most complex of organizations, you will likely need to approach it in three steps using the gcloud command line environment:
-
Obtain permissions: ensure you can list and change bucket-level settings across the organization
-
Assess: decide on an impact threshold above which you will disable soft delete, and obtain a list of buckets surpassing that threshold
-
Act: disable soft delete on that list of buckets and consider setting a defaults tag to change the default soft delete duration for newly created buckets
Obtain permissions
Before you can do anything, you will need to identify someone with sufficient access permissions to analyze and change bucket-level configurations across your organization. This could be an existing Organization Administrator. Alternatively, your Organization Administrator could create a custom role and assign it to you or another administrator for the specific purpose of managing soft delete settings:
Note that once everything is done, at the end of this process, and your buckets are all updated, the Organization Administrator could safely delete this custom role if there wasn’t an ongoing need for a role with continued access to these settings:
Assess
Armed with the power to act on bucket-level configurations across your organization, you can apply the project-level analysis above to obtain a list of all buckets across your organization that exceed your chosen impact threshold. Alternatively, you might choose to apply a uniform setting like 0d or 14d across all buckets in your organization.
Act
To update the soft delete policy for all your buckets across all your projects, you can iterate through all your projects, making the appropriate changes to the buckets in each project. For example, the following command disables soft delete on all buckets across your organization:
You may need to quote the argument because some shells can attempt to expand wildcards before passing the arguments to the gcloud CLI. See https://cloud.google.com/storage/docs/wildcards#surprising-behavior.
Alternatively, you can use the filter
option of projects list
to only target a subset of your projects. For example, you might want to update projects with a specific label (--filter="labels.environment:prod"
) or with a certain parent (--filter="parent.id:123456789"
).
As a best practice, we recommend that you consider replacing the per-project action above with a command that selectively disables soft delete on specific bucket IDs. For example, you could loop through your project list, running the soft delete billing impact Python script for each project to update your bucket settings according to a % impact threshold you select to get a much more tailored outcome.
Finally, you may want to consider changing the default soft delete duration if the seven-day service default isn’t a good default for your organization. By creating a soft delete defaults tag at your organization level, it is then possible to attach the tag with a specific value at the organization and/or project levels which will be used as the default soft delete duration whenever a bucket is created without asking for a specific soft delete setting. Optionally, the defaults tag can be used in conjunction with an org policy constraint to require a specific soft delete retention duration if you want to put a mandate in place for your organization.
Summary
By following the best practices in this blog and taking advantage of the available tooling and controls, we hope that you now feel more confident in your ability to protect your business-critical data with soft delete while simultaneously minimizing its billing impact.