Feature preprocessing overview
Feature preprocessing is one of the most important steps in the machine learning lifecycle. It consists of creating features and cleaning the training data. Creating features is also referred as feature engineering.
BigQuery ML provides the following feature preprocessing techniques:
Automatic preprocessing. BigQuery ML performs automatic preprocessing during training. For more information, see Automatic feature preprocessing.
Manual preprocessing. You can use the
TRANSFORM
clause in theCREATE MODEL
statement to define custom preprocessing using manual preprocessing functions. You can also use these functions outside of theTRANSFORM
clause to process training data before creating the model.
Get feature information
You can use the ML.FEATURE_INFO
function to
retrieve the statistics of all input feature columns.
Recommended knowledge
By using the default settings in the CREATE MODEL
statements and the
inference functions, you can create and use BigQuery ML models
even without much ML knowledge. However, having basic knowledge about the
ML development lifecycle, such as feature engineering and model training,
helps you optimize both your data and your model to
deliver better results. We recommend using the following resources to develop
familiarity with ML techniques and processes:
- Machine Learning Crash Course
- Intro to Machine Learning
- Data Cleaning
- Feature Engineering
- Intermediate Machine Learning
What's next
Learn about feature serving in BigQuery ML.