When you create a Cloud Bigtable instance, you choose whether its clusters store data on solid-state drives (SSD) or hard disk drives (HDD):
- SSD storage is the most efficient and cost-effective choice for most use cases.
- HDD storage is sometimes appropriate for very large data sets (>10 TB) that are not latency-sensitive or are infrequently accessed.
Regardless of which type of storage you choose, your data is stored on a distributed, replicated file system that spans across many physical drives.
The guidelines on this page can help you choose between SSD and HDD.
When in doubt, choose SSD storage
There are several reasons why it's usually best to use SSD storage for your Cloud Bigtable cluster:
- SSD is significantly faster and has more predictable performance than HDD. In a Cloud Bigtable cluster, SSD storage delivers 6 ms latencies for both reads and writes for 99% of all requests. By contrast, HDD storage delivers 200 ms read latencies and 50 ms write latencies on the same benchmark.
- HDD throughput is much more limited than SSD throughput. In a cluster that uses HDD storage, it's easy to reach the maximum throughput before CPU usage reaches 100%. To increase throughput, you must add more nodes, but the cost of the additional nodes can easily exceed your savings from using HDD storage. SSD storage does not have this limitation, because it offers much more throughput per node—generally, a cluster that uses SSD storage reaches maximum throughput only when it is using all available CPU and memory.
- Individual row reads on HDD are very slow. Because of disk seek time, HDD storage supports only 5% of the read rows per second of SSD storage. Large multi-row scans, however, are not as adversely impacted.
- The cost savings from HDD are minimal, relative to the cost of the nodes in your Cloud Bigtable cluster, unless you're storing very large amounts of data. For this reason, as a rule of thumb, you shouldn't consider using HDD storage unless you're storing at least 10 TB of data.
One potential drawback of SSD storage is that it requires more nodes in your clusters based on the amount of data that you store. In practice, though, you might need those extra nodes so that your clusters can keep up with incoming traffic, not only to support the amount of data that you're storing.
Use cases for HDD storage
HDD storage is suitable for use cases that meet the following criteria:
- You expect to store at least 10 TB of data.
- You will not use the data to back a user-facing or latency-sensitive application.
Your workload falls into one of the following categories:
- Batch workloads with scans and writes, and no more than occasional random reads of a small number of rows.
- Data archival, where you write very large amounts of data and rarely read that data.
For example, if you plan to store extensive historical data for a large number of remote-sensing devices and then use the data to generate daily reports, the cost savings for HDD storage might justify the performance tradeoff. On the other hand, if you plan to use the data to display a real-time dashboard, it probably would not make sense to use HDD storage—reads would be much more frequent in this case, and reads are much slower with HDD storage.
Switching between SSD and HDD storage
When you create a Cloud Bigtable instance and cluster, your choice of SSD or HDD storage for the cluster is permanent. You cannot use the Google Cloud Platform Console to change the type of storage that is used for the cluster.
If you need to convert an existing HDD cluster to SSD, or vice-versa, you can export the data from the existing instance and import the data into a new instance. Alternatively, you can use a Cloud Dataflow or Hadoop MapReduce job to copy the data from one instance to another. Keep in mind that migrating an entire instance takes time, and you might need to add nodes to your Cloud Bigtable clusters before you migrate your instance.