Countpattern Transform

Counts the number of instances of a specified pattern in a column and writes that value into a newly generated column. Source column is unchanged.

Basic Usage

countpattern col: myCol on: 'honda'

Output: Generates a new column containing the number of instances of the string honda that appear in each row of the column, myCol.

Parameters

countpattern col:column_ref [ignoreCase:true|false] [after:start_point | from: start_point] [before:end_point | to:end_point] [on:'exact_match']

TokenRequired?Data TypeDescription
countpatternYtransformName of the transform
colYstringSource column name
ignoreCaseNbooleanIf true, matching is case-insensitive.

Matching parameters:

NOTE: At least one of the following parameters must be included to specify the pattern to count: after, before, from, on, to.

TokenRequired?Data TypeDescription
afterNstringString literal or pattern that precedes the pattern to match
beforeNstringString literal or pattern that appears after the pattern to match
fromNstringString literal or pattern that identifies the start of the pattern to match
onNstringString literal or pattern that identifies the pattern to match.
toNstringString literal or pattern that identifies the end of the pattern to match

For more information on syntax standards, see Language Documentation Syntax Notes.

col

Identifies the column to which to apply the transform. You can specify only one column.

countpattern col: MyCol on: 'MyString'

Output: Counts the number of instances of the value MyString in the MyCol column and writes this value to a new column.

Usage Notes:

Required?Data Type
YesString (column name)

after

countpattern col: MyCol after: 'Important:'

Output: Counts 1 if there is a value that appears after the string Important: of value in MyCol . If the after value does not appear in the column, the output value is 0.

A pattern identifier that precedes the value or pattern to match. Define the after parameter value using string literals, regular expressions, or Cloud Dataprep patterns.

Usage Notes:

Required?Data Type
NoString (string literal or pattern)
  • The after and from parameters are very similar. from includes the matching value as part of the extracted string.
  • after can be used with either to, on, or before. See Pattern Clause Position Matching

before

A pattern identifier that occurs after the value or pattern to match. Define the pattern using string literals, regular expressions, or Cloud Dataprep patterns.

countpattern col: MyCol before: '|'

Output:

  • Counts 1 if there is a value that appears before the pipe character (|) in the MyCol column, and no other pattern parameter is specified. If the before value does not appear in the column, the output value is 0.
  • If another pattern parameter such as after is specified, the total count of instances is written to the new column.

Usage Notes:

Required?Data Type
NoString or pattern
  • The before and to parameters are very similar. to includes the matching value as part of the extracted string.
  • before can be used with either from, on, or after. See Pattern Clause Position Matching .

from

Identifies the pattern that marks the beginning of the value to match. Pattern can be a string literal, Cloud Dataprep pattern, or regular expression. The from value is included in the match.

countpattern col: MyCol from: 'go:'

Output:

  • Counts 1 if contents from MyCol that occur from go:, to the end of the cell when no other pattern parameter is specified. If go: does not appear in the column, the output value is blank.
  • If another pattern parameter such as to is specified, the total count of instances is written to the new column.

Usage Notes:

Required?Data Type
NoString or pattern
  • The after and from parameters are very similar. from includes the matching value as part of the extracted string.
  • from can be used with either to or before. See Pattern Clause Position Matching .

on

Identifies the pattern to match. Pattern can be a string literal, Cloud Dataprep pattern, or regular expression pattern.

countpattern col: MyCol on: `###ERROR`

Tip: You can insert the Unicode equivalent character for this parameter value using a regular expression of the form /\uHHHH/. For example, /\u0013/ represents Unicode character 0013 (carriage return). For more information, see Supported Special Regular Expression Characters.

Usage Notes:

Required?Data Type
No

String (literal, regular expression, or Cloud Dataprep pattern )

to

Identifies the pattern that marks the ending of the value to match. Pattern can be a string literal, Cloud Dataprep pattern, or regular expression. The to value is included in the match.

countpattern col:MyCol from:'note:' to: `/`

Output:

  • Counts instances from MyCol column of all values that begin with note: up to a backslash character.
  • If a second pattern parameter is not specified, then this value is either 0 or 1.

Usage Notes:

Required?Data Type
NoString or pattern
  • The before and to parameters are very similar. to includes the matching value as part of the extracted string.
  • to can be used with either from or after. See Pattern Clause Position Matching.

ignoreCase

Indicates whether the match should ignore case or not.

  • Set to true to ignore case matching.
  • (Default) Set to false to perform case-sensitive matching.

countpattern col: MyCol on: 'My String' ignoreCase: true

Output: Counts the instances of the following values if they appear in the MyCol column: My String, my string, My string, etc.

Usage Notes:

Required?Data Type
NoBoolean

Examples

Example - counting patterns in tweets

Source:

The dataset below contains fictitious tweet information shortly after the release of an application called, "Myco ExampleApp".

DatetwitterIdisEmployeetweet
11/5/15lawrencetlu38141FALSEJust downloaded Myco ExampleApp! Transforming data in 5 mins!
11/5/15petramktng024TRUETry Myco ExampleApp, our new free data wrangling app! See www.example.com.
11/5/15joetri221TRUEProud to announce the release of Myco ExampleApp, the free version of our enterprise product. Check it out at www.example.com.
11/5/15datadaemon994FALSEGreat start with Myco ExampleApp. Super easy to use, and actually fun.
11/5/1599redballoons99FALSELiking this new ExampleApp! Good job, guys!
11/5/15bigdatadan7182FALSE@support, how can I find example datasets for use with your product?

There are two areas of analysis:

  • For non-employees, you want to know if they are mentioning the new product by name.
  • For employees, you want to know if they are including cross-references to the web site as part of their tweet.

Transform:

The following counts the occurrences of the string ExampleApp in the tweet column. Note the use of the ignoreCase parameter to capture capitalization differences:

countpattern col:tweet on:'ExampleApp' ignoreCase:true

For non-employees, you want to track if they have mentioned the product in their tweet:

derive type:single value:IF(isEmployee=='FALSE' && countpattern_tweet=='1',true,false) as:'nonEmployeeExampleAppMentions'

The following counts the occurrences of example.com in their tweets:

countpattern col:tweet on:'example.com' ignoreCase:true

For employees, you want to track if they included the above cross-reference in their tweets:

derive type:single value:IF(isEmployee=='TRUE' && countpattern_tweet1 == 1, true, false) as:'employeeWebsiteCrossRefs'

Results:

After you drop the two columns tabulating the counts, you end up with the following:

DatetwitterIdisEmployeetweet employeeWebsiteCrossRefs nonEmployeeExampleAppMentions
11/5/15lawrencetlu38141FALSEJust downloaded Myco ExampleApp! Transforming data in 5 mins!falsetrue
11/5/15petramktng024TRUETry Myco ExampleApp, our new free data wrangling app! See www.example.com.truefalse
11/5/15joetri221TRUEProud to announce the release of Myco ExampleApp, the free version of our enterprise product. Check it out at www.example.com.truefalse
11/5/15datadaemon994FALSEGreat start with Myco ExampleApp. Super easy to use, and actually fun.falsetrue
11/5/1599redballoons99FALSELiking this new ExampleApp! Good job, guys!falsetrue
11/5/15bigdatadan7182FALSE@support, how can I find example datasets for use with your product?falsefalse

Was this page helpful? Let us know how we did:

Send feedback about...

Google Cloud Dataprep Documentation