Tiered storage overview

This page describes and explains how tiered storage works in Spanner. This feature is supported in both GoogleSQL-dialect databases and PostgreSQL-dialect databases.

Spanner tiered storage is a fully-managed storage feature that lets you choose whether to store your data on solid-state drives (SSD) or hard disk drives (HDD). By default, when you're not using tiered storage, your data is stored on SSD storage. Depending on how often you use or access the data, you might consider using tiered storage and storing data on both SSD and HDD storage.

  • SSD storage is the most performant (higher queries per second) and cost-effective choice for most use cases. You should use it to store active data with high write and read throughput and data that requires low-latency data access.
  • HDD storage is sometimes appropriate for large datasets that aren't latency-sensitive, are infrequently accessed, or if the cost of storage is an important consideration.

Using tiered storage lets you take advantage of both SSD storage, which supports the high performance of active data, and HDD storage, which supports infrequent data access at a lower cost.

Choose between SSD and HDD storage

The following table lists the differences and similarities between SSD and HDD storage. When in doubt, we recommend that you choose SSD storage.

SSD storage HDD storage
Target use cases Data that requires high write and read throughput, and low-latency data access Large datasets that aren't latency-sensitive or are infrequently accessed
Expected throughput per node
Regional configurations
Up to 3,500 QPS write
Up to 22,500 QPS read
Up to 3,500 QPS write
Up to 1,500 QPS read
Expected throughput per node
Dual-region and multi-region configurations
Up to 2,700 QPS write
Up to 15,000 QPS read
Up to 3,500 QPS write
Up to 1,000 QPS read
Supported operations Read, write, update, and delete Read, write, update, and delete

Benefits

Tiered storage offers the following benefits by letting you use both SSD and HDD storage:

  • Significant total cost of ownership reduction: HDD storage provides a lower cost option for large datasets that aren't latency-sensitive or are infrequently accessed.
  • Ease of management: Provides a fully-managed tiering service without the complexity of additional pipelines and split logic.
  • Unified and consistent experience: Provides unified data access and a single set of metrics across hot and (mutable) cold data
  • Enhanced performance: Improves query performance by organizing your data in different locality group, which provides data locality and isolation across columns. Data in the same locality group is stored physically close together.

How tiered storage works

By default, when you create a new instance, data is only stored on SSD storage. Similarly, data in existing instances is also only stored on SSD storage.

If you choose to use tiered storage to store some data in HDD storage, you must create a locality group, which is used to define the tiered storage policy for data in your schema. When you create a locality group, you can define the storage type, either ssd or hdd. Optionally, you can also define the amount of time that data is stored on SSD storage before it's moved to HDD storage. After the specified time passes, Spanner migrates the data to HDD storage during its normal compaction cycle, which typically occurs over the course of seven days from the specified time. This is known as an age-based tiered storage policy. When using an age-based tiered storage policy, the minimum amount of time that data must be stored in SSD before it's moved to HDD storage is one hour.

With your locality groups defined, when you create your tables, you can set the tiered storage policy at the database, table, column, or secondary index-level. The tiered storage policy determines how and where data is stored. For instructions, see Create and manage locality groups.

Back up and restore

You can back up and restore your data using Spanner backups. The backup contains all storage schema information, including INFORMATION_SCHEMA.LOCALITY_GROUP_OPTIONS, which specifies the storage type of each locality group. To restore a backup that contains locality groups to a new instance, the destination instance must be in the Spanner Enterprise edition or Spanner Enterprise Plus edition.

Data Boost

You can use Spanner Data Boost to access data on SSD or HDD storage. Querying data on HDD storage incurs a higher cost due to increased I/O operations. For more information, see Pricing.

Search indexes

Full-text search and vector indexes inherit the locality group that is set on the database object.

Observability

The following observability features are available for tiered storage.

Cloud Monitoring metrics

Spanner provides the following metrics to help you monitor your tiered storage usage and data using Cloud Monitoring:

  • spanner.googleapis.com/instance/storage/used_bytes (Total storage): Shows the total bytes of data stored on SSD and HDD storage. On the Google Cloud console Spanner Instance and Database System insights page, there's a drop-down menu for Storage type for this metric. Use the drop-down to show the total bytes of data stored on All, HDD only, or SSD only storage.
  • spanner.googleapis.com/instance/storage/combined/limit_bytes: Shows the combined SSD and HDD storage limit.
  • spanner.googleapis.com/instance/storage/combined/limit_bytes_per_processing_unit: Shows the combined SSD and HDD storage limit for each processing unit.
  • spanner.googleapis.com/instance/storage/combined/utilization: Shows the combined SSD and HDD storage utilization, compared against the combined storage limit.
  • spanner.googleapis.com/instance/disk_load: Shows the HDD usage in percentage. If your instance reaches 100% disk load, you experience significant increased latency.

If you have existing queries that filter existing metrics by storage_class:ssd, you must remove the filter to see your HDD usage.

To learn more about monitoring your Spanner resources, see Monitor instances with system insights and Monitor instances with Cloud Monitoring.

Information schema

INFORMATION_SCHEMA.LOCALITY_GROUP_OPTIONS contains the list of locality groups and options in your Spanner database. It includes information for the default locality group. For more information, see locality_group_options for GoogleSQL-dialect databases and locality_group_options for PostgreSQL-dialect databases.

Built-in statistics tables

The following built-in statistics tables are available for databases using tiered storage:

  • SPANNER_SYS.TABLE_SIZES_STATS_1HOUR: Shows HDD and SSD storage usage for each table in your database.
  • SPANNER_SYS.TABLE_SIZES_STATS_PER_LOCALITY_GROUP_1HOUR: Shows HDD and SSD storage usage for each locality group in your database.

For more information, see Table sizes statistics and Query statistics.

Pricing

There is no additional charge for using tiered storage. You are charged the standard Spanner pricing for the amount of compute capacity that your instance uses and the amount of storage that your database uses. Data that is stored in SSD and HDD is billed at their respective storage rates. You aren't charged for moving data between SSD and HDD storage. For more information, see Spanner pricing.

What's next