Aggregate Transform

The aggregate transform performs summary calculations across a set of values in a column, as grouped by the values in another column.For example, you can compute the average and standard deviation of test scores by student, by gender, by class room number, or by all of those groups.

  • In this case, all other data in the dataset is removed. You can add additional columns as part of the aggregate transform.
  • For more information on available functions, see Aggregate Functions.

Basic Usage

aggregate value:MAX(totalSales) group: Region

Output: Reshapes the dataset to two columns: The new Region column contains the unique values from the source Region column, and the max_totalSales column contains the maximum value of the values in the original totalSales column for each value in Region.

Parameters

aggregate value: AGGREGATE_FUNCTION(column_ref) [group: group_col]

TokenRequired?Transform BuilderData TypeDescription
aggregateYAggregate rowstransformName of the transform
valueYFunctionsstring

Expression that evaluates to the aggregate function call and its parameters

See Aggregate Functions.

groupNGroup bystringColumn name or names containing the values by which to group for calculation

For more information on syntax standards, see Language Documentation Syntax Notes.

value

For the aggregate transform, the value parameter contains the function call and its parameters, which define the set of columns to which the function is applied.

NOTE: For the value parameter, you can only use aggregate functions. For more information, see Aggregate Functions .

Usage Notes:

Required?Data Type
YesString (expression that evaluates to a value using the referenced aggregate function)

group

For the aggregate transform, this parameter specifies the column or columns whose values are used to group the dataset prior to applying the specified function. You can specify multiple column names as comma-separated values.

If no group parameter is applied, the transform is applied across the entire dataset.

NOTE: Be careful applying this transform across groups containing a large number of unique rows. In some cases, the application can run out of memory generating the results, and your results can fail.

Usage Notes:

Required?Data Type
NoString (column name)

Examples

See the individual functions for examples. See Aggregate Functions.

Was this page helpful? Let us know how we did:

Send feedback about...

Google Cloud Dataprep Documentation