Optimize cost: Storage

Last reviewed 2023-08-08 UTC

This document in Google Cloud Architecture Framework provides recommendations to help you optimize the usage and cost of your Cloud Storage, Persistent Disk, and Filestore resources.

The guidance in this section is intended for architects and administrators responsible for provisioning and managing storage for workloads in the cloud.

Cloud Storage

When you plan Cloud Storage for your workloads, consider your requirements for performance, data retention, and access patterns.

Storage class

Choose a storage class that suits the data-retention and access-frequency requirements of your workloads, as recommended in the following table:

Storage requirement	Recommendation
Data that's accessed frequently (high-throughput analytics or data lakes, websites, streaming videos, and mobile apps).	Standard storage
Low-cost storage for infrequently accessed data that can be stored for at least 30 days (for example, backups and long-tail multimedia content).	Nearline storage
Infrequently accessed data that can be stored for at least 90 days (for example, data replicas for disaster recovery).	Coldline storage
Lowest-cost storage for infrequently accessed data that can be stored for at least 365 days (for example, legal and regulatory archives).	Archive storage

Location

Select the location for your buckets based on your requirements for performance, availability, and data redundancy.

Regions are recommended when the region is close to your end users. You can select a specific region, and get guaranteed redundancy within the region. Regions offer fast, redundant, and affordable storage for datasets that users within a particular geographical area access frequently.
Multi-regions provide high availability for distributed users. However, the storage cost is higher than for regions. Multi-region buckets are recommended for content-serving use cases and for low-end analytics workloads.
Dual-regions provide high availability and data redundancy. Google recommends dual-region buckets for high-performance analytics workloads and for use cases that require true active-active buckets with compute and storage colocated in multiple locations. Dual-regions let you choose where your data is stored, which can help you meet compliance requirements. For example, you can use a dual-region bucket to meet industry-specific requirements regarding the physical distance between copies of your data in the cloud.

Lifecycle policies

Optimize storage cost for your objects in Cloud Storage by defining lifecycle policies. These policies help you save money by automatically downgrading the storage class of specific objects or deleting objects based on conditions that you set.

Configure lifecycle policies based on how frequently objects are accessed and how long you need to retain them. The following are examples of lifecycle policies:

Downgrade policy: You expect a dataset to be accessed frequently but for only around three months. To optimize the storage cost for this dataset, use Standard storage, and configure a lifecycle policy to downgrade objects older than 90 days to Coldline storage.
Deletion policy: A dataset must be retained for 365 days to meet certain legal requirements and can be deleted after that period. Configure a policy to delete any object that's older than 365 days.

To help you ensure that data that needs to be retained for a specific period (for legal or regulatory compliance) is not deleted before that date or time, configure retention policy locks.

Accountability

To drive accountability for operational charges, network charges, and data-retrieval cost, use the Requester Pays configuration where appropriate. With this configuration, the costs are charged to the department or team that uses the data, rather than the owner.

Define and assign cost-tracking labels consistently for all your buckets and objects. Automate labeling when feasible.

Redundancy

Use the following techniques to maintain the required storage redundancy without data duplication:

To maintain data resilience with a single source of truth, use a dual-region or multi-region bucket rather than redundant copies of data in different buckets. Dual-region and multi-region buckets provide redundancy across regions. Your data is replicated asynchronously across two or more locations, and is protected against regional outages.
If you enable object versioning, consider defining lifecycle policies to remove the oldest version of an object as newer versions become noncurrent. Each noncurrent version of an object is charged at the same rate as the live version of the object.
Disable object versioning policies when they are no longer necessary.
Review your backup and snapshot retention policies periodically, and adjust them to avoid unnecessary backups and data retention.

Persistent Disk

Every VM instance that you deploy in Compute Engine has a boot disk, and (optionally) one or more data disks. Each disk incurs cost depending on the provisioned size, region, and disk type. Any snapshots you take of your disks incur costs based on the size of the snapshot.

Use the following design and operational recommendations to help you optimize the cost of your persistent disks:

Don't over-allocate disk space. You can't reduce disk capacity after provisioning. Start with a small disk, and increase the size when required. Persistent disks are billed for provisioned capacity, not the data that's stored on the disks.
Choose a disk type that matches the performance characteristics of your workload. SSD provides high IOPS and throughput, but costs more than standard persistent disks.
Use regional persistent disks only when protecting data against zonal outages is essential. Regional persistent disks are replicated to another zone within the region, so you incur double the cost of equivalent zonal disks.
Track the usage of your persistent disks by using Cloud Monitoring, and set up alerts for disks with low usage.
Delete disks that you no longer need.
For disks that contain data that you might need in the future, consider archiving the data to low-cost Cloud Storage and then deleting the disks.
Look for and respond to the recommendations in the Recommendation Hub.

Consider also using Hyperdisks for high-performance storage and Ephemeral disks (local SSDs) for temporary storage.

Disk snapshots are incremental by default and automatically compressed. Consider the following recommendations for optimizing the cost of your disk snapshots:

When feasible, organize your data in separate persistent disks. You can then choose to back up disks selectively, and reduce the cost of disk snapshots.
When you create a snapshot, select a location based on your availability requirements and the associated network costs.
If you intend to use a boot-disk snapshot to create multiple VMs, create an image from the snapshot, and then use the image to create your VMs. This approach helps you avoid network charges for data traveling between the location of the snapshot and the location where you restore it.
Consider setting up a retention policy to minimize long-term storage costs for disk snapshots.
Delete disk snapshots that you no longer need. Each snapshot in a chain might depend on data stored in a previous snapshot. So deleting a snapshot doesn't necessarily delete all the data in the snapshot. To definitively delete data from snapshots, you should delete all the snapshots in the chain.

Filestore

The cost of a Filestore instance depends on its service tier, the provisioned capacity, and the region where the instance is provisioned. The following are design and operational recommendations to optimize the cost of your Filestore instances:

Select a service tier and storage type (HDD or SSD) that's appropriate for your storage needs.
Don't over-allocate capacity. Start with a small size and increase the size later when required. Filestore billing is based on provisioned capacity, not the stored data.
Where feasible, organize your data in separate Filestore instances. You can then choose to back up instances selectively, and reduce the cost of Filestore backups.
When choosing the region and zone, consider creating instances in the same zone as the clients. You're billed for data transfer traffic from the zone of the Filestore instance.
When you decide the region where Filestore backups should be stored, consider the data transfer charges for storing backups in a different region from the source instance.
Track the usage of your Filestore instances by using Cloud Monitoring, and set up alerts for instances with low usage.
Scale down the allocated capacity for Filestore instances that have low usage. You can reduce the capacity of instances except for the Basic tier.

What's next

Review best practices for Cloud Storage cost optimization (blog)
Optimize cost for compute services, databases, networking, and operations:
Explore the other categories of the Google Cloud Architecture Framework