Class Transformation (0.4.0)

Transformation(mapping=None, *, ignore_unknown_fields=False, **kwargs)

Inheritance

builtins.object > proto.message.Message > Transformation

Classes

AutoTransformation

AutoTransformation(mapping=None, *, ignore_unknown_fields=False, **kwargs)

Training pipeline will infer the proper transformation based on the statistic of dataset.

CategoricalArrayTransformation

CategoricalArrayTransformation(
    mapping=None, *, ignore_unknown_fields=False, **kwargs
)

Treats the column as categorical array and performs following transformation functions.

  • For each element in the array, convert the category name to a dictionary lookup index and generate an embedding for each index. Combine the embedding of all elements into a single embedding using the mean.
  • Empty arrays treated as an embedding of zeroes.

CategoricalTransformation

CategoricalTransformation(mapping=None, *, ignore_unknown_fields=False, **kwargs)

Training pipeline will perform following transformation functions.

  • The categorical string as is--no change to case, punctuation, spelling, tense, and so on.
  • Convert the category name to a dictionary lookup index and generate an embedding for each index.
  • Categories that appear less than 5 times in the training dataset are treated as the "unknown" category. The "unknown" category gets its own special lookup index and resulting embedding.

NumericArrayTransformation

NumericArrayTransformation(mapping=None, *, ignore_unknown_fields=False, **kwargs)

Treats the column as numerical array and performs following transformation functions.

  • All transformations for Numerical types applied to the average of the all elements.
  • The average of empty arrays is treated as zero.

NumericTransformation

NumericTransformation(mapping=None, *, ignore_unknown_fields=False, **kwargs)

Training pipeline will perform following transformation functions.

  • The value converted to float32.
  • The z_score of the value.
  • log(value+1) when the value is greater than or equal to 0. Otherwise, this transformation is not applied and the value is considered a missing value.
  • z_score of log(value+1) when the value is greater than or equal to 0. Otherwise, this transformation is not applied and the value is considered a missing value.
  • A boolean value that indicates whether the value is valid.

TextArrayTransformation

TextArrayTransformation(mapping=None, *, ignore_unknown_fields=False, **kwargs)

Treats the column as text array and performs following transformation functions.

  • Concatenate all text values in the array into a single text value using a space (" ") as a delimiter, and then treat the result as a single text value. Apply the transformations for Text columns.
  • Empty arrays treated as an empty text.

TextTransformation

TextTransformation(mapping=None, *, ignore_unknown_fields=False, **kwargs)

Training pipeline will perform following transformation functions.

  • The text as is--no change to case, punctuation, spelling, tense, and so on.
  • Tokenize text to words. Convert each words to a dictionary lookup index and generate an embedding for each index. Combine the embedding of all elements into a single embedding using the mean.
  • Tokenization is based on unicode script boundaries.
  • Missing values get their own lookup index and resulting embedding.
  • Stop-words receive no special treatment and are not removed.

TimestampTransformation

TimestampTransformation(mapping=None, *, ignore_unknown_fields=False, **kwargs)

Training pipeline will perform following transformation functions.

  • Apply the transformation functions for Numerical columns.
  • Determine the year, month, day,and weekday. Treat each value from the
  • timestamp as a Categorical column.
  • Invalid numerical values (for example, values that fall outside of a typical timestamp range, or are extreme values) receive no special treatment and are not removed.