EXAMPLE - Statistical Functions

This example illustrates how you can apply statistical functions to your dataset. Calculations include average (mean), max, min, standard deviation, and variance.

Source:

Students took a test and recorded the following scores. You want to perform some statistical analysis on them:

StudentScore
Anna84
Ben71
Caleb76
Danielle87
Evan85
Faith92
Gabe86
Hannah99
Ian73
Jane68

Transform:

You can use the following transforms to calculate the average (mean), minimum, and maximum scores:

derive value:AVERAGE(Score) as:'avgScore'

derive value:MIN(Score) as:'minScore'

derive value:MAX(Score) as:'maxScore'

To apply statistical functions to your data, you can use the VAR and STDEV functions, which can be used as the basis for other statistical calculations.

derive value:VAR(Score)

derive value:STDEV(Score)

For each score, you can now calculate the variation of each one from the average, using the following:

derive value:((Score - average_Score) / stdev_Score) as:'stDevs'

Now, you want to apply grades based on a formula:

Gradestandard deviations from avg (stDevs)
AstDevs > 1
BstDevs > 0.5
C-1 <= stDevs <= 0.5
DstDevs < -1
FstDevs < -2

You can build the following transform using the IF function to calculate grades.

derive value:IF((stDevs > 1),'A',IF((stDevs < -2),'F',IF((stDevs < -1),'D',IF((stDevs > 0.5),'B','C'))))

For more information, see IF Function.

To clean up the content, you might want to apply some formatting to the score columns. The following reformats the stdev_Score and stDevs columns to display two decimal places:

set col:stdev_Score value:NUMFORMAT(stdev_Score, '##.00')

set col:stDevs value:NUMFORMAT(stDevs, '##.00')

Results:

StudentScoreavgScoreminScoremaxScorevar_Scorestdev_ScorestDevsGrade
Anna8482.16899

87.69000000000142

9.360.20C
Ben7182.1689987.690000000001429.36-1.19D
Caleb7682.1689987.690000000001429.36-0.65C
Danielle8782.1689987.690000000001429.360.52B
Evan8582.1689987.690000000001429.360.31C
Faith9282.1689987.690000000001429.361.06A
Gabe8682.1689987.690000000001429.360.42C
Hannah9982.1689987.690000000001429.361.80A
Ian7382.1689987.690000000001429.36-0.97C
Jane6882.1689987.690000000001429.36-1.51D
Was this page helpful? Let us know how we did:

Send feedback about...

Google Cloud Dataprep Documentation