Transformation(mapping=None, *, ignore_unknown_fields=False, **kwargs)
Inheritance
builtins.object > proto.message.Message > TransformationClasses
AutoTransformation
AutoTransformation(mapping=None, *, ignore_unknown_fields=False, **kwargs)
Training pipeline will infer the proper transformation based on the statistic of dataset.
CategoricalArrayTransformation
CategoricalArrayTransformation(
mapping=None, *, ignore_unknown_fields=False, **kwargs
)
Treats the column as categorical array and performs following transformation functions.
- For each element in the array, convert the category name to a dictionary lookup index and generate an embedding for each index. Combine the embedding of all elements into a single embedding using the mean.
- Empty arrays treated as an embedding of zeroes.
CategoricalTransformation
CategoricalTransformation(mapping=None, *, ignore_unknown_fields=False, **kwargs)
Training pipeline will perform following transformation functions.
- The categorical string as is--no change to case, punctuation, spelling, tense, and so on.
- Convert the category name to a dictionary lookup index and generate an embedding for each index.
- Categories that appear less than 5 times in the training dataset are treated as the "unknown" category. The "unknown" category gets its own special lookup index and resulting embedding.
NumericArrayTransformation
NumericArrayTransformation(mapping=None, *, ignore_unknown_fields=False, **kwargs)
Treats the column as numerical array and performs following transformation functions.
- All transformations for Numerical types applied to the average of the all elements.
- The average of empty arrays is treated as zero.
NumericTransformation
NumericTransformation(mapping=None, *, ignore_unknown_fields=False, **kwargs)
Training pipeline will perform following transformation functions.
- The value converted to float32.
- The z_score of the value.
- log(value+1) when the value is greater than or equal to 0. Otherwise, this transformation is not applied and the value is considered a missing value.
- z_score of log(value+1) when the value is greater than or equal to 0. Otherwise, this transformation is not applied and the value is considered a missing value.
- A boolean value that indicates whether the value is valid.
TextArrayTransformation
TextArrayTransformation(mapping=None, *, ignore_unknown_fields=False, **kwargs)
Treats the column as text array and performs following transformation functions.
- Concatenate all text values in the array into a single text value using a space (" ") as a delimiter, and then treat the result as a single text value. Apply the transformations for Text columns.
- Empty arrays treated as an empty text.
TextTransformation
TextTransformation(mapping=None, *, ignore_unknown_fields=False, **kwargs)
Training pipeline will perform following transformation functions.
- The text as is--no change to case, punctuation, spelling, tense, and so on.
- Tokenize text to words. Convert each words to a dictionary lookup index and generate an embedding for each index. Combine the embedding of all elements into a single embedding using the mean.
- Tokenization is based on unicode script boundaries.
- Missing values get their own lookup index and resulting embedding.
- Stop-words receive no special treatment and are not removed.
TimestampTransformation
TimestampTransformation(mapping=None, *, ignore_unknown_fields=False, **kwargs)
Training pipeline will perform following transformation functions.
- Apply the transformation functions for Numerical columns.
- Determine the year, month, day,and weekday. Treat each value from the
- timestamp as a Categorical column.
- Invalid numerical values (for example, values that fall outside of a typical timestamp range, or are extreme values) receive no special treatment and are not removed.