Filters
When you read data from Bigtable, you can read specific rows or ranges of rows. However, you don't always need all of the data in all of the rows. You might only need rows that contain a specific value in their row key, or cells within a specific column family.
To limit the results of a read request, include filters in the request. A filter is applied to the data before the response is sent, reducing the amount of data that is returned. As result, using filters can mean lower network costs and faster throughput. This page provides an overview of how Bigtable filters work and a list of available filters.
For details and code samples for each filter, see Filter examples.
How filters work
When your read request includes a filter, Bigtable retrieves a row or a range of rows from your table. For each of the input rows that it retrieves, Bigtable evaluates the row using your filter, then generates an output row based on the filter results.
Bigtable provides several types of filters, as described in the following sections. Basic filters fall under two categories: limiting and modifying. You can combine basic filters into composing filters.
In most cases, a filter is applied to all rows unless you specify row key, row range, or number of rows that the filter should be applied to. One exception is the row key regex filter, which can restrict the row range in certain cases if the regex is a fixed prefix. In general, to avoid the slowness of a full table scan, always specify the rows for a filter.
Limiting filters
A limiting filter controls which rows or cells are included in the response, based on whether they match specific criteria. For example, you can say that the response should include only rows in which the row key matches a regular expression, or that you want only cells from a specific column family.
Many limiting filters can exclude cells from an output row. If all of the cells are excluded from an output row, the row is not included in the response.
See the Summary of filters for a complete list of limiting filters.
Modifying filters
A modifying filter affects the data or metadata for individual cells.
Bigtable provides the following modifying filters:
The strip value filter, which replaces each cell's value with an empty string. This filter is useful when you only need the number of rows or the list of row keys that meet your criteria, rather than the data from those rows.
The apply label filter, which applies a label to each cell to identify which filter produced each cell in the response. Your application can use these labels to perform additional filtering on the client side.
Composing filters
A composing filter allows you to combine multiple basic filters into one,
which makes it possible to apply more than one filter to a single read request.
For example, to get CPU usage data for your servers, you could use one filter to
include only rows where the row key starts with SERVER
, followed by a second
filter to include only cells within the CPU
column family.
Bigtable provides the following composing filters:
- A chain, which applies a sequence of filters to each input row and returns an output row. A chain filter is like using a logical AND.
- An interleave, which sends each input row through multiple filters, then combines all of the filter results for the input row into a single output row. An interleave filter is like using a logical OR.
- A condition, which generates an output row by applying one of two possible filters to the input row. The filter is chosen by applying a predicate filter to the input row, then checking to see whether the predicate filter's output row contains any cells.
Filters and performance
Filters allow you to retrieve only the data you actually need. As a result, filters can improve performance by reducing the amount of data that is sent to your application.
However, filters are not an all-purpose solution to every performance issue. In general, filters should be used to control throughput efficiency, not to reduce the latency between sending a request and receiving a response. Used correctly, filters can be an effective part of a strategy to improve read performance.
The conditions filter, in particular, can increase latency, because conditions are much slower than other filters. If your read request is extremely performance-sensitive, do not use conditions in the request.
Summary of filters
The following tables list the filters that Bigtable provides, including links to details and code samples for each filter.
Limiting filters | |
---|---|
Block all | Don't emit any cells. Mostly useful for debugging. |
Cells per column limit | Include only the N most recent versions of a column in a row. |
Cells per row limit | Include only the first N cells from a row. |
Cells per row offset | Omit the first N cells from a row. |
Column family regex | Include only cells whose column family matches an RE2 regular expression. |
Column qualifier regex | Include only cells whose column qualifier matches a regular expression. |
Column range | Include only cells in a specific column family whose column qualifier is within a specific range. |
Pass all | Emit all input cells. Mostly useful for debugging. |
Row key regex | Include only cells whose row key matches a regular expression. |
Row sample | Retrieve a random sample of rows. |
Sink | Include cells in the final output row, and prevent them from being modified or removed by a subsequent filter. |
Timestamp range | Include only cells whose timestamp falls within a specific range. |
Value range | Include only cells whose value falls within a specific range. |
Value regex | Include only cells whose value matches a regular expression. |
Modifying filters | |
---|---|
Apply label | Add a label to all cells. |
Strip value | Return an empty string for each cell value. |
Composing filters | |
---|---|
Chain | Apply multiple filters in order. |
Condition | Apply one of two possible filters to a row. |
Interleave | Combine output rows from multiple filters into a single output row. |