BigQuery ML lets you create and execute machine learning models in BigQuery using GoogleSQL queries. BigQuery ML democratizes machine learning by letting SQL practitioners build models using existing SQL tools and skills. BigQuery ML increases development speed by eliminating the need to move data.
BigQuery ML functionality is available by using:
- The Google Cloud console
- The BigQuery REST API
- An external tool such as a Jupyter notebook or business intelligence platform
Machine learning on large datasets requires extensive programming and knowledge of ML frameworks. These requirements restrict solution development to a very small set of people within each company, and they exclude data analysts who understand the data but have limited machine learning knowledge and programming expertise.
BigQuery ML empowers data analysts to use machine learning through existing SQL tools and skills. Analysts can use BigQuery ML to build and evaluate ML models in BigQuery. Analysts don't need to export small amounts of data to spreadsheets or other applications or wait for limited resources from a data science team.
Supported models in BigQuery ML
A model in BigQuery ML represents what an ML system has learned from the training data.
BigQuery ML supports the following types of models:
- Linear regression for forecasting; for example, the sales of an item on a given day. Labels are real-valued (they cannot be +/- infinity or NaN).
- Binary logistic regression for classification; for example, determining whether a customer will make a purchase. Labels must only have two possible values.
- Multiclass logistic regression for classification. These models can be used to predict multiple possible values such as whether an input is "low-value," "medium-value," or "high-value." Labels can have up to 50 unique values. In BigQuery ML, multiclass logistic regression training uses a multinomial classifier with a cross-entropy loss function.
- K-means clustering for data segmentation; for example, identifying customer segments. K-means is an unsupervised learning technique, so model training does not require labels nor split data for training or evaluation.
- Matrix Factorization for creating product recommendation systems. You can create product recommendations using historical customer behavior, transactions, and product ratings and then use those recommendations for personalized customer experiences.
- Time series for performing time-series forecasts. You can use this feature to create millions of time series models and use them for forecasting. The model automatically handles anomalies, seasonality, and holidays.
- Boosted Tree for creating XGBoost based classification and regression models.
- Deep Neural Network (DNN) for creating TensorFlow-based Deep Neural Networks for classification and regression models.
- Vertex AI AutoML Tables to perform machine learning with tabular data using simple processes and interfaces.
- TensorFlow model importing. This feature lets you create BigQuery ML models from previously trained TensorFlow models, then perform prediction in BigQuery ML.
- Autoencoder for creating Tensorflow-based BigQuery ML models with the support of sparse data representations. The models can be used in BigQuery ML for tasks such as unsupervised anomaly detection and non-linear dimensionality reduction.
In BigQuery ML, you can use a model with data from multiple BigQuery datasets for training and for prediction.
Model selection guide
Advantages of BigQuery ML
BigQuery ML has the following advantages over other approaches to using ML with a cloud-based data warehouse:
- BigQuery ML democratizes the use of ML by empowering data analysts, the primary data warehouse users, to build and run models using existing business intelligence tools and spreadsheets. Predictive analytics can guide business decision-making across the organization.
There is no need to program an ML solution using Python or Java. Models are trained and accessed in BigQuery using SQL — a language familiar to data analysts.
BigQuery ML increases the speed of model development and innovation by removing the need to export data from the data warehouse. Instead, BigQuery ML brings ML to the data. BigQuery ML has the following advantages over exporting and reformatting data:
- Reduces complexity because fewer tools are required
- Increases speed to production because moving and formatting large amounts data for Python-based ML frameworks is not required for model training in BigQuery.
BigQuery ML and Vertex AI
BigQuery ML integrates with Vertex AI, Google Cloud's end-to-end AI/ML platform. When you register your BigQuery ML models to Vertex AI Model Registry, you can deploy these models to endpoints for online prediction.
- To learn more about using your BigQuery ML models with Vertex AI, see Manage BigQuery ML models with Vertex AI
- If you aren't familiar with Vertex AI and want to learn more about how it integrates with BigQuery ML, see Vertex AI for BigQuery users.
BigQuery ML is supported in the same regions as BigQuery. See the locations page for a complete list of supported regions and multi-regions.
BigQuery ML models are stored in BigQuery datasets like tables and views. For information about BigQuery ML pricing, see BigQuery ML pricing.
For information about BigQuery storage pricing, see Storage pricing. For information about BigQuery ML query pricing, see Query pricing.
In addition to BigQuery ML-specific limits,
queries that use BigQuery ML functions and
CREATE MODEL statements are
subject to the quotas and limits on BigQuery query jobs.
For more information about all BigQuery ML quotas and limits, see Quotas and limits.
- To get started using BigQuery ML, see Getting started with BigQuery ML using the Google Cloud console.
- To learn more about machine learning and BigQuery ML, see the
- Applying machine learning to your data with Google Cloud course at Coursera
- Data and machine learning training program
- Machine learning crash course
- Machine learning glossary
- To learn about MLOps with Vertex AI Model Registry, see MLOps with Vertex AI.