Suppose you want to visit a friend. Should you travel by airplane? It depends where you're going. If your friend lives nearby, there are probably better ways to get to your friend's house. But if your friend lives very far away, traveling by airplane is probably the fastest way to go.
Using Cloud Bigtable is a lot like traveling by airplane. Cloud Bigtable excels at handling very large amounts of data (terabytes or petabytes) over a relatively long period of time (hours or days). It's not designed to work with small amounts of data in short bursts. If you test Cloud Bigtable for 30 seconds with a few GB of data, you won't get an accurate picture of its performance.
Keep reading to learn more about the performance you can expect from Cloud Bigtable.
Performance for typical workloads
Under a typical workload, Cloud Bigtable delivers highly predictable performance. When everything is running smoothly, you can expect to achieve the following performance for each node in your Cloud Bigtable cluster, depending on which type of storage your cluster uses:
|SSD||10,000 QPS1 @ 6 ms||10,000 QPS @ 6 ms||220 MB/s|
|HDD||500 QPS @ 200 ms||10,000 QPS @ 50 ms||180 MB/s|
|1 Queries per second. A query is a read or write operation against a single row.|
These performance numbers are guidelines, not hard and fast rules. Per-node performance may vary based on your workload and the typical size of each row in your table.
In general, a cluster's performance increases linearly as you add nodes to the cluster. For example, if you create an SSD cluster with 10 nodes, the cluster can support up to 100,000 QPS for a typical workload, with 6 ms latency for each read and write operation.
There are several factors that can result in slower performance:
- The table's schema is not designed correctly. To get good performance from Cloud Bigtable, it's essential to design a schema that allows reads and writes to be evenly distributed across the Cloud Bigtable cluster. See "Designing Your Schema" for more information.
- The workload isn't appropriate for Cloud Bigtable. If you test with a small amount (< 300 GB) of data, or if you test for a very short period of time (seconds rather than minutes or hours), Cloud Bigtable won't be able to balance your data in a way that gives you good performance. It needs time to learn your access patterns, and it needs large enough shards of data to make use of all of the nodes in your cluster.
- The Cloud Bigtable cluster doesn't have enough nodes. If your Cloud Bigtable cluster is overloaded, adding more nodes can improve performance. Use the monitoring tools to check whether the cluster is overloaded.
- The Cloud Bigtable cluster was scaled up very recently. After you add nodes to a cluster, it can take up to 20 minutes under load before you see a significant improvement in the cluster's performance.
- The Cloud Bigtable cluster uses HDD disks. In most cases, your cluster should use SSD disks, which have significantly better performance than HDD disks. See "Choosing between SSD and HDD storage" for details.
- There are issues with the network connection. Network issues can reduce throughput and cause reads and writes to take longer than usual. In particular, you'll see issues if your clients are not running in the same zone as your Cloud Bigtable cluster.
Because different workloads can cause performance to vary, you should perform tests with your own workloads to obtain the most accurate benchmarks.
How Cloud Bigtable optimizes your data over time
To store the underlying data for each of your tables, Cloud Bigtable shards the data into multiple tablets, which can be moved between nodes in your Cloud Bigtable cluster. This storage method allows Cloud Bigtable to use two different strategies for optimizing your data over time:
- Cloud Bigtable tries to store roughly the same amount of data on each Cloud Bigtable node.
- Cloud Bigtable tries to distribute reads and writes equally across all Cloud Bigtable nodes.
Sometimes these strategies conflict with one another. For example, if one tablet's rows are read extremely frequently, Cloud Bigtable might store that tablet on its own node, even though this causes some nodes to store more data than others.
As part of this process, Cloud Bigtable may also split a tablet into two or more smaller tablets, either to reduce a tablet's size or to isolate hot rows within an existing tablet.
The following sections explain each of these strategies in more detail.
Distributing the amount of data across nodes
As you write data to a Cloud Bigtable table, Cloud Bigtable shards the table's data into tablets. Each tablet contains a contiguous range of rows within the table.
If you have written only a small amount of data to the table, Cloud Bigtable will store all of the tablets on a single node within your cluster:
As more tablets accumulate, Cloud Bigtable will move some of them to other nodes in the cluster so that the amount of data is balanced more evenly across the cluster:
Distributing reads and writes evenly across nodes
If you've designed your schema correctly, then reads and writes should be distributed fairly evenly across your entire table. However, there are some cases where you can't avoid accessing certain rows more frequently than others. Cloud Bigtable helps you deal with these cases by taking reads and writes into account when it balances tablets across nodes.
For example, suppose that 25% of reads are going to a small number of tablets within a cluster, and reads are spread evenly across all other tablets:
Cloud Bigtable will redistribute the existing tablets so that reads are spread as evenly as possible across the entire cluster:
Testing performance with Cloud Bigtable
If you're running a performance test that depends upon Cloud Bigtable, be sure to follow these steps as you plan and execute your test:
- Use at least 300 GB of data. Cloud Bigtable performs best with 1 TB or more of data. However, 300 GB of data is enough to provide reasonable results in a performance test on a 3-node cluster. On larger clusters, use 100 GB of data per node.
- Before you test, run a heavy pre-test for several minutes. This step gives Cloud Bigtable a chance to balance data across your nodes based on the access patterns it observes.
- Run your test for at least 10 minutes. This step lets Cloud Bigtable further optimize your data, and it helps ensure that you will test reads from disk as well as cached reads from memory.
Troubleshooting performance issues
If you think that Cloud Bigtable may be creating a performance bottleneck in your application, be sure to check all of the following:
- Try commenting out the code that performs Cloud Bigtable reads and writes. If the performance issue disappears, then you're probably using Cloud Bigtable in a way that results in suboptimal performance. If the performance issue persists, the issue is probably not related to Cloud Bigtable.
- Ensure that you're creating and reusing a single connection. Opening a
connection to Cloud Bigtable is a relatively expensive operation. If you're
using the Cloud Bigtable HBase client for Java, make sure you're creating and
sharing one long-lived
Connectionobject across all of the threads in your application. The
Connectionobject will automatically handle multiplexing across multiple simultaneous threads. Similarly, if you're using another client, make sure you're creating and reusing a single connection to Cloud Bigtable.
Make sure you're reading and writing many different rows in your table. Cloud Bigtable performs best when reads and writes are evenly distributed throughout your table, which helps Cloud Bigtable distribute the workload across all of the nodes in your cluster. If reads and writes cannot be spread across all of your Cloud Bigtable nodes, performance will suffer.
If you find that you're reading and writing only a small number of rows, you may need to redesign your schema so that reads and writes are more evenly distributed.
Verify that you see approximately the same performance for reads and writes. If you find that reads are much faster than writes, you may be trying to read rowkeys that do not exist, or a large range of rowkeys that contains only a small number of rows.
To make a valid comparison between reads and writes, you should aim for at least 90% of your reads to return valid results. Also, if you're reading a large range of rowkeys, measure performance based on the actual number of rows in that range, rather than the maximum number of rows that could exist.