Reads
This page describes the types of read requests you can send to Bigtable, discusses performance implications, and presents a few recommendations for specific types of queries. Before you read this page, you should be familiar with the overview of Bigtable.
Overview
Read requests to Bigtable stream back the contents of the requested rows in key order, meaning they are returned in the order in which they are stored. You are able to read any writes that have returned a response.
The queries that your table supports should help determine the type of read that is best for your use case. Bigtable read requests fall into two general categories:
- Reading a single row
- Scans, or reading multiple rows
Reads are atomic at the row level. This means that when you send a read request for a row, Bigtable returns either the entire row or, in the event the request fails, none of the row. A partial row is never returned unless you specifically request one.
We strongly recommend that you use our Cloud Bigtable client libraries to read data from a table instead of calling the API
directly. Code samples showing how to send read requests are
available in multiple languages. All read requests make the ReadRows
API call.
Reading data with Data Boost serverless compute
Bigtable Data Boost lets you run batch read jobs and queries without affecting daily application traffic. Data Boost is a serverless compute service that you can use to read your Bigtable data while your core application uses your cluster's nodes for compute.
Data Boost is ideal for scans and is not recommended for single-row reads. You can't use Data Boost for reverse scans. For more information and eligibility criteria, see the Data Boost overview.
Single-row reads
You can request a single row based on the row key. Single-row reads, also known as point reads, are not compatible with Data Boost. Code samples are available for the following variations:
Scans
Scans are the most common way to read Bigtable data. You can read a range of contiguous rows or multiple ranges of rows from Bigtable, by specifying a row key prefix or specifying beginning and ending row keys. Code samples are available for the following variations:
Reverse scans
Reverse scans let you read a range of rows backwards either by specifying a row key prefix or a range of rows. The row key prefix is used as the initiating point of scan to read backwards. If you specify a range of rows, the end row key is used as the initiating point of scan.
Scanning in reverse order can be useful for the following scenarios:
- You want to find an event (row) and then read the previous N number of events.
- You want to find the highest value prior to a given value. This can be helpful when you store time series data using a timestamp as a row key suffix.
Reverse scans are less efficient than forward scans. In general, design your row keys so that most scans are forward. Use reverse scans for short scans, such as 50 rows or less, to maintain low-latency response time.
To scan in reverse, you set the value for the ReadRowsRequest
field reversed
to true. The default is false.
Reverse scans are available when you use the following client libraries:
- Bigtable client library for C++ version 2.18.0 or later
- Bigtable client library for Go version 1.21.0 or later
- Bigtable client library for Java version 2.24.1 or later
- Bigtable HBase client for Java version 2.10.0 or later
For code samples demonstrating how to use reverse scans, see Scan in reverse.
Use case examples
The following examples show how reverse scans can be used to find the last time a customer changed their password and price fluctuations for a product around a particular day.
Password resets
Consider an assumption that your row keys each contain a customer ID and a
date, in the format 123ABC#2022-05-02
, and one of the columns is
password_reset
, which stores the hour when the password was reset.
Bigtable automatically stores the data lexicographically, like the
following. Note that the column does not exist for rows (days) when the password
was not reset.
`123ABC#2022-02-12,password_reset:03`
`123ABC#2022-04-02,password_reset:11`
`123ABC#2022-04-14`
`123ABC#2022-05-02`
`223ABC#2022-05-22`
If you want to find the last time that customer 123ABC
reset their password,
you can scan in reverse a range of 123ABC#
to 123ABC#<DATE>
, using today's
date or a date in the future, for all rows that contain the column
password_reset
with a row limit of 1.
Price changes
In this example, your row keys contain values for product, model, and timestamp, and one of the columns contains the price for the product and model at a given time.
`productA#model2#1675604471,price:82.63`
`productA#model2#1676219411,price:82.97`
`productA#model2#1677681011,price:83.15`
`productA#model2#1680786011,price:83.99`
`productA#model2#1682452238,price:83.12`
If you want to find price fluctuations surrounding the price on February 14,
2023, even though a row key for that particular date doesn't exist in the
table, you can do a forward scan starting from row key
productA#model2#1676376000
for N number of rows, and then do a reverse scan
for the same number of rows from the same starting row. The two scans give you
the prices before and after the given time.
Filtered reads
If you only need rows that contain specific values, or partial rows, you can use a filter with your read request. Filters allow you to be highly selective in the data that you want.
Filters also let you make sure that reads match the garbage collection policies that your table is using. This is particularly useful if you frequently write new timestamped cells to existing columns. Because garbage collection can take up to a week to remove expired data, using a timestamp range filter to read data can ensure you don't read more data than you need.
The overview of filters provides detailed explanations of the types of filters that you can use. Using filters shows examples in multiple languages.
Read data from an authorized view
To read data from an authorized view, you must use one of the following:
- gcloud CLI
- Bigtable client for Java
The other Bigtable client libraries don't yet support view access.
Any method that calls the ReadRows
or SampleRowKeys
method of the
Bigtable Data API is supported. You provide the authorized view ID
in addition to the table ID when you create your client.
Reads and performance
Reads that use filters are slower than reads without filters, and they increase CPU utilization. On the other hand, they can significantly reduce the amount of network bandwidth that you use, by limiting the amount of data that is returned. In general, filters should be used to control throughput efficiency, not latency.
If you want to optimize your read performance, consider the following strategies:
Restrict the rowset as much as possible. Limiting the number of rows that your nodes have to scan is the first step toward improving time to first byte and overall query latency. If you don't restrict the rowset, Bigtable will almost certainly have to scan your entire table. This is why we recommend that you design your schema in a way that allows your most common queries to work this way.
For additional performance tuning after you've restricted the rowset, try adding a basic filter. Restricting the set of columns or the number of versions returned generally doesn't increase latency and can sometimes help Bigtable seek more efficiently past irrelevant data in each row.
If you want to fine-tune your read performance even more after the first two strategies, consider using a more complicated filter. You might try this for a few reasons:
- You're still getting back a lot of data you don't want.
- You want to simplify your application code by pushing the query down into Bigtable.
Be aware, however, that filters requiring conditions, interleaves, or regular expression matching on large values tend to do more harm than good if they allow most of the scanned data through. This harm comes in the form of increased CPU utilization in your cluster without large savings client-side.
In addition to these strategies, avoid reading a large number of non-contiguous row keys or row ranges in a single read request. When you request hundreds of row keys or row ranges in a single request, Bigtable scans the table and reads the requested rows sequentially. This lack of parallelism affects the overall latency, and any reads that hit a hot node can increase the tail latency. The more row ranges requested, the longer the read takes to complete. If this latency is unacceptable, you should instead send multiple concurrent requests that each retrieve fewer row ranges.
In general, reading more row ranges in a single request optimizes throughput, but not latency. Reading fewer row ranges in multiple concurrent requests optimizes latency, but not throughput. Finding the right balance between latency and throughput will depend on your application's requirements, and can be arrived at by adjusting the concurrent read request count and the number of row ranges in one request.
Large rows
Bigtable limits the size of a row to 256 MB,
but it's possible to accidentally exceed that maximum. If you need to read a
row that has grown larger than the limit, you can paginate your request and use
a cells per row limit
filter and a
cells per row offset
filter. Be aware that if a write
arrives for the row between the paginated read requests, the read might not be
atomic.
What's next
- Implement counters by using aggregate cells.
- Read an overview of filters.
- Look at code samples showing how to use filters.
- Read about the types of write requests you can send to Bigtable.
- Use the Bigtable emulator.