What is BigQuery ML?
BigQuery ML lets you create and execute machine learning models using GoogleSQL queries. BigQuery ML democratizes machine learning by letting SQL practitioners build models using existing SQL tools and skills. BigQuery ML increases development speed by eliminating the need to move data.
BigQuery ML functionality is available by using:
- The Google Cloud console
- The
bqcommand-line tool - The BigQuery REST API
- An external tool such as a Jupyter notebook or business intelligence platform
Machine learning on large datasets requires extensive programming and knowledge of ML frameworks. These requirements restrict solution development to a very small set of people within each company, and they exclude data analysts who understand the data but have limited machine learning knowledge and programming expertise.
BigQuery ML empowers data analysts to use machine learning through existing SQL tools and skills. Analysts can use BigQuery ML to build and evaluate ML models in BigQuery. Analysts don't need to export small amounts of data to spreadsheets or other applications or wait for limited resources from a data science team.
Supported models
A model in BigQuery ML represents what a machine learning (ML) system has learned from training data. BigQuery ML supports the following types of models:
Internally trained models
The following models are built in to BigQuery ML:
- Linear regression is for forecasting. For example, this model forecasts the sales of an item on a given day. Labels are real-valued, meaning they cannot be +/- infinity or NaN.
- Logistic regression is for the classification of two or more possible values such as whether an input is "low-value," "medium-value," or "high-value." Labels can have up to 50 unique values.
- K-means clustering is for data segmentation. For example, this model identifies customer segments. K-means is an unsupervised learning technique, so model training does not require labels or split data for training or evaluation.
- Matrix factorization is for creating product recommendation systems. You can create product recommendations using historical customer behavior, transactions, and product ratings, and then use those recommendations for personalized customer experiences.
- Principal component analysis (PCA) is the process of computing the principal components and using them to perform a change of basis on the data. It is commonly used for dimensionality reduction by projecting each data point onto only the first few principal components to obtain lower-dimensional data while preserving as much of the data's variation as possible.
- Time series is for performing time-series forecasts. You can use this feature to create millions of time series models and use them for forecasting. The model automatically handles anomalies, seasonality, and holidays.
Externally trained models
The following models are external to BigQuery ML and trained in Vertex AI:
- Deep neural network (DNN) is for creating TensorFlow-based deep neural networks for classification and regression models.
- Wide & Deep is useful for generic large-scale regression and classification problems with sparse inputs (categorical features with a large number of possible feature values), such as recommender systems, search, and ranking problems.
- Autoencoder is for creating TensorFlow-based models with the support of sparse data representations. The models can be used in BigQuery ML for tasks such as unsupervised anomaly detection and non-linear dimensionality reduction.
- Boosted Tree is for creating classification and regression models that are based on XGBoost.
- Random forest is for constructing multiple learning method decision trees for classification, regression, and other tasks at training time.
- Vertex AI AutoML Tables is a supervised ML service that uses tabular data to build and deploy ML models on structured data at high speed and scale.
Remote models
You can create remote
models
in BigQuery with a Vertex AI endpoint or
remote_service_type option.
Imported models
BigQuery ML lets you import custom models trained outside of BigQuery, then perform prediction within BigQuery. The following models can be imported into BigQuery from Cloud Storage:
- Open Neural Network Exchange (ONNX) is an open standard format for representing ML models. Using ONNX, you can make models trained with popular ML frameworks like PyTorch and scikit-learn available in BigQuery ML.
- TensorFlow is a free, open source software library for ML and artificial intelligence. TensorFlow can be used across a range of tasks but has a particular focus on training and inference of deep neural networks. You can load previously trained TensorFlow models into BigQuery as BigQuery ML models and then perform prediction in BigQuery ML.
- TensorFlow Lite is a light version of TensorFlow for deployment on mobile devices, microcontrollers, and other edge devices. TensorFlow optimizes existing TensorFlow models for reduced model size and faster inference.
- XGBoost is an optimized distributed gradient boosting library designed to be highly efficient, flexible, and portable. It implements ML algorithms under the Gradient Boosting framework.
In BigQuery ML, you can use a model with data from multiple BigQuery Datasets for training and for prediction.
Model selection guide
Advantages of BigQuery ML
BigQuery ML has the following advantages over other approaches to using ML with a cloud-based data warehouse:
- BigQuery ML democratizes the use of ML by empowering data analysts, the primary data warehouse users, to build and run models using existing business intelligence tools and spreadsheets. Predictive analytics can guide business decision-making across the organization.
There is no need to program an ML solution using Python or Java. Models are trained and accessed in BigQuery using SQL — a language familiar to data analysts.
BigQuery ML increases the speed of model development and innovation by removing the need to export data from the data warehouse. Instead, BigQuery ML brings ML to the data. BigQuery ML has the following advantages over exporting and reformatting data:
- Reduces complexity because fewer tools are required
- Increases speed to production because moving and formatting large amounts data for Python-based ML frameworks is not required for model training in BigQuery.
For more information, watch the video How to accelerate machine learning development with BigQuery ML.
BigQuery ML and Vertex AI
BigQuery ML integrates with Vertex AI, Google Cloud's end-to-end AI/ML platform. When you register your BigQuery ML models to Vertex AI Model Registry, you can deploy these models to endpoints for online prediction.
- To learn more about using your BigQuery ML models with Vertex AI, see Manage BigQuery ML models with Vertex AI.
- If you aren't familiar with Vertex AI and want to learn more about how it integrates with BigQuery ML, see Vertex AI for BigQuery users.
For more information, watch the video How to simplify AI models with Vertex AI and BigQuery ML.
Supported regions
BigQuery ML is supported in the same regions as BigQuery. For more information, see BigQuery ML locations.
Pricing
BigQuery ML models are stored in BigQuery datasets like tables and views. For information about BigQuery ML pricing, see BigQuery ML pricing.
For information about BigQuery storage pricing, see Storage pricing. For information about BigQuery ML query pricing, see Query pricing.
Quotas
In addition to BigQuery ML-specific limits,
queries that use BigQuery ML functions and CREATE MODEL statements are
subject to the quotas and limits on BigQuery query jobs.
Limitations
- BigQuery ML is not available in Standard edition.
- BigQuery ML does not trigger autoscaling slots. You must set a baseline amount of slots to use BigQuery ML with a BigQuery edition. This limitation only applies to externally trained models, not internally trained models. For more information about the types of models, see Supported models.
What's next
- To get started using BigQuery ML, see Create machine learning models in BigQuery ML.
- To learn more about machine learning and BigQuery ML, see the
following resources:
- Applying machine learning to your data with Google Cloud course at Coursera
- Data and machine learning training program
- Machine learning crash course
- Machine learning glossary
- To learn about MLOps with Vertex AI Model Registry, see MLOps with Vertex AI.