Filter Transform

NOTE: Transforms are a part of the underlying language that is not directly accessible to users. This content is maintained for reference purposes only.

Removes or keeps the matching rows of data, based on a condition that you specify or a custom formula that you apply.

Basic Usage

You can filter your dataset based on the following condition types:

Example - is missing

filter missing: qty action: Drop

Output: Deletes all rows in which the value in the qty column is missing.

Example - is mismatched

filter col: CoName mismatched: 'String' action: Drop

Output: Deletes all rows in which the value in the CoName column does not match the String data type.

Example - is exactly

filter col: basic exactly: find(basic, '545', true, 1) == 8 action: Keep

Output: Keeps all rows where 545 appears at the eighth character in the basic column.

Example - is one of

filter col: zipCode oneOf: '94104','94105' action: Keep

Output: Keeps all rows in which the value of the zipCode column is either 94104 or 94105 and deletes all other rows in the dataset.

Example - Less than (or equal to)

filter col: row_number lessThanEqual: 5 action: Keep

Output: Keeps all rows in the dataset where the value in the row_number column is less than or equal to 5. All other rows are deleted.

Example - Greater than (or equal to)

filter col: row_number greaterThanEqual: 10 action: Drop

Output: Deletes all rows in the dataset where the value in row_number is greater than or equal to 10.

Example - Is Between

filter col: row_number greaterThan: 5 lessThanEqual: 15 action: Keep

Output: Keeps all rows where the row_number value is greater than 5 or less than or equal to 15. All other rows are deleted.

Example - Contains

filter col: phoneNum contains: `\({digit}{3}\)` action: Keep

Output: Keeps all rows where the phoneNum value contains a three-digit pattern surrounded by parentheses (XXX). All other rows are deleted.

Example - Starts with

filter col: phoneNum startsWith: '(981)' action: Keep

Output: Keeps all rows where the phoneNum value begins with (981). All other rows are deleted.

Example - Ends with

filter col: zipCode endsWith: `\-{digit}{4}` action: Drop

Output: Deletes all rows where the zipCode value ends with a four-digit extension.

Example - custom formula

filter row: (row_number >= 25 && firstName == 'Steve') action: Keep

Output: Keeps all rows where the row_number value is greater than or equal to 25 and the firstName value is Steve. All other rows are deleted.

Parameters

filter col:column_ref [missing: column_ref] [exactly: expression_ref] [mismatched: 'data_type_str'] [exactly: expression] [oneOf: 'string_1','string_2'] [lessthan | lessThanEqual: numVal] [greaterthan | greaterThanEqual: numVal] [contains: string_or_pattern] [startsWith|endsWith: string_or_pattern] action: [Drop|Keep]

TokenRequired?Data TypeDescription
filterYtransformName of the transform
rowNstringExpression identifying the row or rows to filter. If expression evaluates to true for a row, the row is either kept or deleted.
colNstringName of the column or expression for columns to drop
missingNstringName of column to evaluate for missing values.
mismatchedNstringString literal for the data type to check for mismatches.
exactlyNstring

String literal, Cloud Dataprep pattern, or regular expression that evaluates to an exact match for a row value in the specified column.

oneOfNstringList of string literals, any of which can be matched.
lessThan or lessThanEqualNinteger, decimal, or expression

Integer or decimal literal or expression evaluating to numeric value below which results in a match. Can also match on the specified expression exactly.

Parameter is also used for the Between condition type.

greaterThan or greaterThanEqualNinteger, decimal, or expression

Integer or decimal literal or expression evaluating to numeric value above which results in a match. Can also match on the specified expression exactly.

Parameter is also used for the Between condition type.

containsNstring

String literal, Cloud Dataprep pattern, or regular expression to be matches somewhere within the specified column values.

startsWithNstring

String literal, Cloud Dataprep pattern, or regular expression to match the beginnings of the values in the specified column.

endsWithNstring

String literal, Cloud Dataprep pattern, or regular expression to match the endings of the values in the specified column.

actionYstringDrop or Keep the listed columns

For more information on syntax standards, see Language Documentation Syntax Notes.

row

Expression to identify the row or rows on which to perform the transform. Expression must evaluate to true or false.

Examples:

ExpressionDescription
Score >= 50
true if the value in the Score column is greater than 50.
LEN(LastName) > 8
true if the length of the value in the LastName column is greater than 8.
ISMISSING([Title])
true if the row value in the Title column is missing.
ISMISMATCHED(Score,['Integer'])
true if the row value in the Score column is mismatched against the Integer data type.

Example:

delete row: (lastContactDate < 01/01/2010 || status == 'Inactive')

Output: Deletes any row in the dataset where the lastContactDate is before January 1, 2010 or the status is Inactive.

Usage Notes:

Required?Data Type
YesExpression that evaluates to true or false

col

Identifies the column or columns to which to apply the transform. You can specify one column or more columns.

Usage Notes:

Required?Data Type
YesString (column name)

missing

For the Is Missing condition type, this value specifies the column to check for missing values.

Usage Notes:

Required?Data Type
Required for Is Missing condition type onlyString (column name)

mismatched

For the Mismatched condition type, this value specifies string for the data type identifier value to check for mismatches. For more information, see Valid Data Type Strings.

The col parameter is also required.

Usage Notes:

Required?Data Type
Required for Mismatched condition type onlyString (data type identifier)

exactly

For the Exactly condition type, this value is a String literal, Cloud Dataprep pattern, or regular expression that exactly matches row values in the specified column.

The col parameter is also required.

Usage Notes:

Required?Data Type
Required for Exactly condition type onlyString (expression)

oneOf

For the One Of condition type, this value is a list of string literals, Cloud Dataprep patterns, or regular expressions. If a row value for the specified column matches one of these expressions, the row is either deleted or kept.

The col parameter is also required.

Usage Notes:

Required?Data Type
Required for One Of condition type only

List of string literals, Cloud Dataprep patterns, or regular expressions

lessThan or lessThanEqual

For the Less Than conditional types, this value is an Integer or Decimal literal or an expression that evaluates to an Integer or Decimal literal. If the value in the specified column is less than (or optionally equal to) this value, then the row is either deleted or kept.

The col parameter is also required.

Usage Notes:

Required?Data Type
Required for Less than (or equal to) condition type onlyInteger or Decimal literal or expression evaluating to one of these data types

greaterThan or greaterThanEqual

For the Less Than conditional types, this value is an Integer or Decimal literal or an expression that evaluates to an Integer or Decimal literal. If the value in the specified column is greater than (or optionally equal to) this value, then the row is either deleted or kept.

The col parameter is also required.

Usage Notes:

Required?Data Type
Required for Greater than (or equal to) condition type onlyInteger or Decimal type or expression evaluating to one of these data types

contains

For the Contains condition type, this value identifies a String literal, Cloud Dataprep pattern, or regular expression, which is used to evaluate partial or full matches to row values in the specified column.

The col parameter is also required.

Usage Notes:

Required?Data Type
Required for Contains condition type only

String literal, Cloud Dataprep pattern, or regular expression

startsWith

For the Starts With condition type, this value identifies the String literal, Cloud Dataprep pattern, or regular expression with which a value must start in the specified column to match.

The col parameter is also required.

Usage Notes:

Required?Data Type
Required for Starts with condition type only

String literal, Cloud Dataprep pattern, or regular expression

endsWith

For the Ends With condition type, this value identifies the String literal, Cloud Dataprep pattern, or regular expression with which a value must end in the specified column to match.

The col parameter is also required.

Usage Notes:

Required?Data Type
Required for Ends with condition type only

String literal, Cloud Dataprep pattern, or regular expression

action

Identifies whether the action performed by the transform:

  • Drop - Listed columns are dropped from the dataset.
  • Keep - Listed columns are retained in the dataset, and all other columns are dropped.

Usage Notes:

Required?Data Type
YesString (Drop or Keep)

Examples

See above.

Was this page helpful? Let us know how we did:

Send feedback about...

Google Cloud Dataprep Documentation
Need help? Visit our support page.