SUM Function

Computes the sum of all values found in all row values in a column. Input column can be of Integer or Decimal.
  • If a row contains a missing or null value, it is not factored into the calculation. If no numeric values are found in the source column, the function returns 0.
  • When used in a pivot transform, the function is computed for each instance of the value specified in the group parameter. See Pivot Transform.

For a version of this function computed over a rolling window of rows, see ROLLINGSUM Function.

Basic Usage

pivot value:SUM(myRating) group:postal_code limit:1

Output: Generates a two-column column containing the unique values of the postal_code column and the sum of the group of values from the myRating column for the postal_code value. The limit parameter defines the maximum number of output columns.

Syntax

pivot value:SUM(function_col_ref) [group:group_col_ref] [limit:limit_count]

ArgumentRequired?Data TypeDescription
function_col_refYstringName of column to which to apply the function

For more information on the group and limit parameters, see Pivot Transform.

For more information on syntax standards, see Language Documentation Syntax Notes.

function_col_ref

Name of the column the values of which you want to calculate the sum. Column must contain Integer or Decimal values.

  • Literal values are not supported as inputs.
  • Multiple columns and wildcards are not supported.

Usage Notes:

Required?Data TypeExample Value
YesString (column reference)myValues

Examples

This example illustrates the following functions:

  • LIST - Extracts up to 1000 values from one column into an array in a new column. See LIST Function.
  • LISTUNIQUE - Extracts up to 1000 unique values from one columninto an array in a new column. See LISTUNIQUE Function.

You have the following set of orders for two months, and you are interested in identifying the set of colors that have been sold for each product for each month and the total quantity of product sold for each month.

Source:

OrderIdDateItemQtyColor
10011/15/15Pants1red
10021/15/15Shirt2green
10031/15/15Hat3blue
10041/16/15Shirt4yellow
10051/16/15Hat5red
10061/20/15Pants6green
10071/15/15Hat7blue
10084/15/15Shirt8yellow
10094/15/15Shoes9brown
10104/16/15Pants1red
10114/16/15Hat2green
10124/16/15Shirt3blue
10134/20/15Shoes4black
10144/20/15Hat5blue
10154/20/15Pants6black

Transform:

To track by month, you need a column containing the month value extracted from the date:

set col:Date value:DATEFORMAT(Date, 'MMM yyyy')

You can use the following transform to check the list of unique values among the colors:

pivot value: LISTUNIQUE(Color, 1000) group: Date limit:10

Datelistunique_Color
Jan 2015["green","blue","red","yellow"]
Apr 2015["brown","blue","red","yellow","black","green"]

Delete the above transform.

You can aggregate the data in your dataset, grouped by the reformatted Date values, and apply the LIST function to the Color column. In the same aggregation, you can include a summation function for the Qty column:

pivot value: LIST(Color, 1000) SUM(Qty) group: Date limit:10

Results:

Datelist_Colorsum_Qty
Jan 2015["green","blue","blue","red","green","red","yellow"]28
Apr 2015["brown","blue","red","yellow","black","blue","black","green"] 38

If needed, you can unpack the list array data using the following:

unnest col:list_Color

Was this page helpful? Let us know how we did:

Send feedback about...

Google Cloud Dataprep Documentation
Need help? Visit our support page.