When you create a Cloud Bigtable instance and cluster, you choose whether to store the cluster's data on solid-state drives (SSD) or hard disk drives (HDD):
- SSD storage is the most efficient and cost-effective choice for most use cases.
- HDD storage is sometimes appropriate for very large data sets (>10 TB) that are not latency-sensitive or are infrequently accessed.
Regardless of which type of storage you choose, your data will be stored on a distributed, replicated file system that spans across many physical drives.
The guidelines on this page can help you choose between SSD and HDD.
When in doubt, choose SSD storage
There are several reasons why it's usually best to use SSD storage for your Cloud Bigtable cluster:
- SSD is significantly faster and has more predictable performance than HDD. In a Cloud Bigtable cluster, SSD storage delivers 6 ms latencies for both reads and writes for 99% of all requests. By contrast, HDD storage delivers 200 ms read latencies and 50 ms write latencies on the same benchmark.
- HDD throughput is much more limited than SSD throughput. In a cluster that uses HDD storage, it's easy to reach the maximum throughput before CPU usage reaches 100%. To increase throughput, you must add more nodes, but the cost of the additional nodes can easily exceed your savings from using HDD storage. SSD storage does not have this limitation, because it offers much more throughput per node—generally, a cluster that uses SSD storage reaches maximum throughput only when it is using all available CPU and memory.
- Individual row reads on HDD are very slow. Because of disk seek time, HDD storage supports only 5% of the read QPS of SSD storage. Large multi-row scans, however, are not as adversely impacted.
- The cost savings from HDD are minimal, relative to the cost of the nodes in your Cloud Bigtable cluster, unless you're storing very large amounts of data. For this reason, as a rule of thumb, you shouldn't consider using HDD storage unless you're storing at least 10 TB of data.
Use cases for HDD storage
HDD storage is suitable for use cases that meet the following criteria:
- You expect to store at least 10 TB of data.
- You will not use the data to back a user-facing or latency-sensitive application.
- You will mostly run batch workloads with scans and writes, rather than frequently executing random reads of a small number of rows.
For example, if you plan to store extensive historical data for a large number of remote-sensing devices and then use the data to generate daily reports, the cost savings for HDD storage may justify the performance tradeoff. On the other hand, if you plan to use the data to display a real-time dashboard, it probably would not make sense to use HDD storage—reads would be much more frequent in this case, and reads are much slower with HDD storage.
Switching between SSD and HDD storage
When you create a Cloud Bigtable instance and cluster, your choice of SSD or HDD storage for the cluster is permanent. You cannot use the Google Cloud Platform Console to change the type of storage that is used for the cluster.
If you need to convert an existing HDD cluster to SSD, or vice-versa, you can export the data from the existing instance and import the data into a new instance. Alternatively, you can write a Cloud Dataflow or Hadoop MapReduce job that copies the data from one instance to another. Keep in mind that migrating an entire instance will take time and will require more Cloud Bigtable nodes than your instance's cluster normally uses.