Bigtable Row Filters
It is possible to use a
RowFilter
when adding mutations to a
ConditionalRow
and when
reading row data with read_row()
or read_rows()
.
As laid out in the RowFilter definition, the following basic filters are provided:
SinkFilter
PassAllFilter
BlockAllFilter
RowKeyRegexFilter
RowSampleFilter
FamilyNameRegexFilter
ColumnQualifierRegexFilter
TimestampRangeFilter
ColumnRangeFilter
ValueRegexFilter
ValueRangeFilter
CellsRowOffsetFilter
CellsRowLimitFilter
CellsColumnLimitFilter
StripValueTransformerFilter
ApplyLabelFilter
In addition, these filters can be combined into composite filters with
RowFilterChain
RowFilterUnion
ConditionalRowFilter
These rules can be nested arbitrarily, with a basic filter at the lowest level. For example:
# Filter in a specified column (matching any column family).
col1_filter = ColumnQualifierRegexFilter(b'columnbia')
# Create a filter to label results.
label1 = u'label-red'
label1_filter = ApplyLabelFilter(label1)
# Combine the filters to label all the cells in columnbia.
chain1 = RowFilterChain(filters=[col1_filter, label1_filter])
# Create a similar filter to label cells blue.
col2_filter = ColumnQualifierRegexFilter(b'columnseeya')
label2 = u'label-blue'
label2_filter = ApplyLabelFilter(label2)
chain2 = RowFilterChain(filters=[col2_filter, label2_filter])
# Bring our two labeled columns together.
row_filter = RowFilterUnion(filters=[chain1, chain2])
Filters for Google Cloud Bigtable Row classes.
class google.cloud.bigtable.row_filters.ApplyLabelFilter(label)
Bases: google.cloud.bigtable.row_filters.RowFilter
Filter to apply labels to cells.
Intended to be used as an intermediate filter on a pre-existing filtered result set. This way if two sets are combined, the label can tell where the cell(s) originated.This allows the client to determine which results were produced from which part of the filter.
NOTE: Due to a technical limitation of the backend, it is not currently possible to apply multiple labels to a cell.
Parameters
label (str) – Label to apply to cells in the output row. Values must be at most 15 characters long, and match the pattern
[a-z0-9\\-]+
.
to_pb()
Converts the row filter to a protobuf.
Return type
data_v2_pb2.RowFilter
Returns
The converted current object.
class google.cloud.bigtable.row_filters.BlockAllFilter(flag)
Bases: google.cloud.bigtable.row_filters._BoolFilter
Row filter that doesn’t match any cells.
Parameters
flag (bool) – Does not match any cells, regardless of input. Useful for temporarily disabling just part of a filter.
to_pb()
Converts the row filter to a protobuf.
Return type
data_v2_pb2.RowFilter
Returns
The converted current object.
class google.cloud.bigtable.row_filters.CellsColumnLimitFilter(num_cells)
Bases: google.cloud.bigtable.row_filters._CellCountFilter
Row filter to limit cells in a column.
Parameters
num_cells (int) – Matches only the most recent N cells within each column. This filters a (family name, column) pair, based on timestamps of each cell.
to_pb()
Converts the row filter to a protobuf.
Return type
data_v2_pb2.RowFilter
Returns
The converted current object.
class google.cloud.bigtable.row_filters.CellsRowLimitFilter(num_cells)
Bases: google.cloud.bigtable.row_filters._CellCountFilter
Row filter to limit cells in a row.
Parameters
num_cells (int) – Matches only the first N cells of the row.
to_pb()
Converts the row filter to a protobuf.
Return type
data_v2_pb2.RowFilter
Returns
The converted current object.
class google.cloud.bigtable.row_filters.CellsRowOffsetFilter(num_cells)
Bases: google.cloud.bigtable.row_filters._CellCountFilter
Row filter to skip cells in a row.
Parameters
num_cells (int) – Skips the first N cells of the row.
to_pb()
Converts the row filter to a protobuf.
Return type
data_v2_pb2.RowFilter
Returns
The converted current object.
class google.cloud.bigtable.row_filters.ColumnQualifierRegexFilter(regex)
Bases: google.cloud.bigtable.row_filters._RegexFilter
Row filter for a column qualifier regular expression.
The regex
must be valid RE2 patterns. See Google’s
RE2 reference for the accepted syntax.
NOTE: Special care need be used with the expression used. Since
each of these properties can contain arbitrary bytes, the \\C
escape sequence must be used if a true wildcard is desired. The .
character will not match the new line character \\n
, which may be
present in a binary value.
Parameters
regex (bytes) – A regular expression (RE2) to match cells from column that match this regex (irrespective of column family).
to_pb()
Converts the row filter to a protobuf.
Return type
data_v2_pb2.RowFilter
Returns
The converted current object.
class google.cloud.bigtable.row_filters.ColumnRangeFilter(column_family_id, start_column=None, end_column=None, inclusive_start=None, inclusive_end=None)
Bases: google.cloud.bigtable.row_filters.RowFilter
A row filter to restrict to a range of columns.
Both the start and end column can be included or excluded in the range. By default, we include them both, but this can be changed with optional flags.
Parameters
column_family_id (str) – The column family that contains the columns. Must be of the form
[_a-zA-Z0-9][-_.a-zA-Z0-9]\*
.start_column (bytes) – The start of the range of columns. If no value is used, the backend applies no upper bound to the values.
end_column (bytes) – The end of the range of columns. If no value is used, the backend applies no upper bound to the values.
inclusive_start (bool) – Boolean indicating if the start column should be included in the range (or excluded). Defaults to
True
ifstart_column
is passed and noinclusive_start
was given.inclusive_end (bool) – Boolean indicating if the end column should be included in the range (or excluded). Defaults to
True
ifend_column
is passed and noinclusive_end
was given.
Raises
ValueError
ifinclusive_start
is set but nostart_column
is given or ifinclusive_end
is set but noend_column
is given
to_pb()
Converts the row filter to a protobuf.
First converts to a data_v2_pb2.ColumnRange
and then uses it
in the column_range_filter
field.
Return type
data_v2_pb2.RowFilter
Returns
The converted current object.
class google.cloud.bigtable.row_filters.ConditionalRowFilter(base_filter, true_filter=None, false_filter=None)
Bases: google.cloud.bigtable.row_filters.RowFilter
Conditional row filter which exhibits ternary behavior.
Executes one of two filters based on another filter. If the base_filter
returns any cells in the row, then true_filter
is executed. If not,
then false_filter
is executed.
NOTE: The base_filter
does not execute atomically with the true and false
filters, which may lead to inconsistent or unexpected results.
Additionally, executing a ConditionalRowFilter
has poor
performance on the server, especially when false_filter
is set.
Parameters
base_filter (
RowFilter
) – The filter to condition on before executing the true/false filters.true_filter (
RowFilter
) – (Optional) The filter to execute if there are any cells matchingbase_filter
. If not provided, no results will be returned in the true case.false_filter (
RowFilter
) – (Optional) The filter to execute if there are no cells matchingbase_filter
. If not provided, no results will be returned in the false case.
to_pb()
Converts the row filter to a protobuf.
Return type
data_v2_pb2.RowFilter
Returns
The converted current object.
class google.cloud.bigtable.row_filters.ExactValueFilter(value)
Bases: google.cloud.bigtable.row_filters.ValueRegexFilter
Row filter for an exact value.
Parameters
value (bytes* or [str](https://docs.python.org/3/library/stdtypes.html#str) or [int*](https://docs.python.org/3/library/functions.html#int)) – a literal string encodable as ASCII, or the equivalent bytes, or an integer (which will be packed into 8-bytes).
class google.cloud.bigtable.row_filters.FamilyNameRegexFilter(regex)
Bases: google.cloud.bigtable.row_filters._RegexFilter
Row filter for a family name regular expression.
The regex
must be valid RE2 patterns. See Google’s
RE2 reference for the accepted syntax.
Parameters
regex (str) – A regular expression (RE2) to match cells from columns in a given column family. For technical reasons, the regex must not contain the
':'
character, even if it is not being used as a literal.
to_pb()
Converts the row filter to a protobuf.
Return type
data_v2_pb2.RowFilter
Returns
The converted current object.
class google.cloud.bigtable.row_filters.PassAllFilter(flag)
Bases: google.cloud.bigtable.row_filters._BoolFilter
Row filter equivalent to not filtering at all.
Parameters
flag (bool) – Matches all cells, regardless of input. Functionally equivalent to leaving
filter
unset, but included for completeness.
to_pb()
Converts the row filter to a protobuf.
Return type
data_v2_pb2.RowFilter
Returns
The converted current object.
class google.cloud.bigtable.row_filters.RowFilter()
Bases: object
Basic filter to apply to cells in a row.
These values can be combined via RowFilterChain
,
RowFilterUnion
and ConditionalRowFilter
.
NOTE: This class is a do-nothing base class for all row filters.
class google.cloud.bigtable.row_filters.RowFilterChain(filters=None)
Bases: google.cloud.bigtable.row_filters._FilterCombination
Chain of row filters.
Sends rows through several filters in sequence. The filters are “chained” together to process a row. After the first filter is applied, the second is applied to the filtered output and so on for subsequent filters.
Parameters
filters (list) – List of
RowFilter
to_pb()
Converts the row filter to a protobuf.
Return type
data_v2_pb2.RowFilter
Returns
The converted current object.
class google.cloud.bigtable.row_filters.RowFilterUnion(filters=None)
Bases: google.cloud.bigtable.row_filters._FilterCombination
Union of row filters.
Sends rows through several filters simultaneously, then merges / interleaves all the filtered results together.
If multiple cells are produced with the same column and timestamp, they will all appear in the output row in an unspecified mutual order.
Parameters
filters (list) – List of
RowFilter
to_pb()
Converts the row filter to a protobuf.
Return type
data_v2_pb2.RowFilter
Returns
The converted current object.
class google.cloud.bigtable.row_filters.RowKeyRegexFilter(regex)
Bases: google.cloud.bigtable.row_filters._RegexFilter
Row filter for a row key regular expression.
The regex
must be valid RE2 patterns. See Google’s
RE2 reference for the accepted syntax.
NOTE: Special care need be used with the expression used. Since
each of these properties can contain arbitrary bytes, the \\C
escape sequence must be used if a true wildcard is desired. The .
character will not match the new line character \\n
, which may be
present in a binary value.
Parameters
regex (bytes) – A regular expression (RE2) to match cells from rows with row keys that satisfy this regex. For a
CheckAndMutateRowRequest
, this filter is unnecessary since the row key is already specified.
to_pb()
Converts the row filter to a protobuf.
Return type
data_v2_pb2.RowFilter
Returns
The converted current object.
class google.cloud.bigtable.row_filters.RowSampleFilter(sample)
Bases: google.cloud.bigtable.row_filters.RowFilter
Matches all cells from a row with probability p.
Parameters
sample (float) – The probability of matching a cell (must be in the interval
(0, 1)
The end points are excluded).
to_pb()
Converts the row filter to a protobuf.
Return type
data_v2_pb2.RowFilter
Returns
The converted current object.
class google.cloud.bigtable.row_filters.SinkFilter(flag)
Bases: google.cloud.bigtable.row_filters._BoolFilter
Advanced row filter to skip parent filters.
Parameters
flag (bool) – ADVANCED USE ONLY. Hook for introspection into the row filter. Outputs all cells directly to the output of the read rather than to any parent filter. Cannot be used within the
predicate_filter
,true_filter
, orfalse_filter
of aConditionalRowFilter
.
to_pb()
Converts the row filter to a protobuf.
Return type
data_v2_pb2.RowFilter
Returns
The converted current object.
class google.cloud.bigtable.row_filters.StripValueTransformerFilter(flag)
Bases: google.cloud.bigtable.row_filters._BoolFilter
Row filter that transforms cells into empty string (0 bytes).
Parameters
flag (bool) – If
True
, replaces each cell’s value with the empty string. As the name indicates, this is more useful as a transformer than a generic query / filter.
to_pb()
Converts the row filter to a protobuf.
Return type
data_v2_pb2.RowFilter
Returns
The converted current object.
class google.cloud.bigtable.row_filters.TimestampRange(start=None, end=None)
Bases: object
Range of time with inclusive lower and exclusive upper bounds.
Parameters
start (
datetime.datetime
) – (Optional) The (inclusive) lower bound of the timestamp range. If omitted, defaults to Unix epoch.end (
datetime.datetime
) – (Optional) The (exclusive) upper bound of the timestamp range. If omitted, no upper bound is used.
to_pb()
Converts the TimestampRange
to a protobuf.
Return type
data_v2_pb2.TimestampRange
Returns
The converted current object.
class google.cloud.bigtable.row_filters.TimestampRangeFilter(range_)
Bases: google.cloud.bigtable.row_filters.RowFilter
Row filter that limits cells to a range of time.
Parameters
range (
TimestampRange
) – Range of time that cells should match against.
to_pb()
Converts the row filter to a protobuf.
First converts the range_
on the current object to a protobuf and
then uses it in the timestamp_range_filter
field.
Return type
data_v2_pb2.RowFilter
Returns
The converted current object.
class google.cloud.bigtable.row_filters.ValueRangeFilter(start_value=None, end_value=None, inclusive_start=None, inclusive_end=None)
Bases: google.cloud.bigtable.row_filters.RowFilter
A range of values to restrict to in a row filter.
Will only match cells that have values in this range.
Both the start and end value can be included or excluded in the range. By default, we include them both, but this can be changed with optional flags.
Parameters
start_value (bytes) – The start of the range of values. If no value is used, the backend applies no lower bound to the values.
end_value (bytes) – The end of the range of values. If no value is used, the backend applies no upper bound to the values.
inclusive_start (bool) – Boolean indicating if the start value should be included in the range (or excluded). Defaults to
True
ifstart_value
is passed and noinclusive_start
was given.inclusive_end (bool) – Boolean indicating if the end value should be included in the range (or excluded). Defaults to
True
ifend_value
is passed and noinclusive_end
was given.
Raises
ValueError
ifinclusive_start
is set but nostart_value
is given or ifinclusive_end
is set but noend_value
is given
to_pb()
Converts the row filter to a protobuf.
First converts to a data_v2_pb2.ValueRange
and then uses
it to create a row filter protobuf.
Return type
data_v2_pb2.RowFilter
Returns
The converted current object.
class google.cloud.bigtable.row_filters.ValueRegexFilter(regex)
Bases: google.cloud.bigtable.row_filters._RegexFilter
Row filter for a value regular expression.
The regex
must be valid RE2 patterns. See Google’s
RE2 reference for the accepted syntax.
NOTE: Special care need be used with the expression used. Since
each of these properties can contain arbitrary bytes, the \\C
escape sequence must be used if a true wildcard is desired. The .
character will not match the new line character \\n
, which may be
present in a binary value.
Parameters
regex (bytes* or [str*](https://docs.python.org/3/library/stdtypes.html#str)) – A regular expression (RE2) to match cells with values that match this regex. String values will be encoded as ASCII.
to_pb()
Converts the row filter to a protobuf.
Return type
data_v2_pb2.RowFilter
Returns
The converted current object.