BigQuery ML enables users to create and execute machine learning models in BigQuery using standard SQL queries. BigQuery ML democratizes machine learning by enabling SQL practitioners to build models using existing SQL tools and skills. BigQuery ML increases development speed by eliminating the need to move data.
BigQuery ML functionality is available by using:
- The BigQuery web UI
- The BigQuery REST API
- An external tool such as a Jupyter notebook or business intelligence platform
Machine learning on large data sets requires extensive programming and knowledge of ML frameworks. These requirements restrict solution development to a very small set of people within each company, and they exclude data analysts who understand the data but have limited machine learning knowledge and programming expertise.
BigQuery ML empowers data analysts to use machine learning through existing SQL tools and skills. Analysts can use BigQuery ML to build and evaluate ML models in BigQuery. Analysts no longer need to export small amounts of data to a spreadsheets or other applications, and analysts no longer need to wait for limited resources from a data science team.
Supported models in BigQuery ML
A model in BigQuery ML represents what an ML system has learned from the training data.
The following types of models are supported by BigQuery ML:
- Linear regression for forecasting; for example, the sales of an item on a given day. Labels are real-valued (they cannot be +/- infinity or NaN).
- Binary logistic regression for classification; for example, determining whether a customer will make a purchase. Labels must only have two possible values.
- Multiclass logistic regression for classification. These models can be used to predict multiple possible values such as whether an input is "low-value," "medium-value," or "high-value." Labels can have up to 50 unique values. In BigQuery ML, multiclass logistic regression training uses a multinomial classifier with a cross entropy loss function.
- K-means clustering for data segmentation; for example, identifying customer segments. K-means is an unsupervised learning technique, so model training does not require labels nor split data for training or evaluation.
- TensorFlow model importing. This feature allows you to create BigQuery ML models from previously-trained TensorFlow models, then perform prediction in BigQuery ML. See the CREATE MODEL statement for importing TensorFlow models for more information.
In BigQuery ML, a model can be used with data from multiple BigQuery datasets for training and for prediction.
Advantages of BigQuery ML
BigQuery ML has the following advantages over other approaches to using ML with a cloud-based data warehouse:
- BigQuery ML democratizes the use of ML by empowering data analysts, the primary data warehouse users, to build and run models using existing business intelligence tools and spreadsheets. This enables business decision making through predictive analytics across the organization.
- There is no need to program an ML solution using Python or Java. Models are trained and accessed in BigQuery using SQL — a language data analysts know.
BigQuery ML increases the speed of model development and innovation by removing the need to export data from the data warehouse. Instead, BigQuery ML brings ML to the data. Exporting and re-formatting the data:
- Increases complexity — Multiple tools are required.
- Reduces speed — Moving and formatting large amounts data for Python-based ML frameworks takes longer than model training in BigQuery.
- Requires multiple steps to export data from the warehouse, restricting the ability to experiment on your data.
- Can be prevented by legal restrictions (such as HIPAA guidelines).
BigQuery ML is supported in the same regions as BigQuery. See the Locations page for a complete list of supported regions and multi-regions.
For more information on all BigQuery ML quotas and limits, see Quotas and limits.
BigQuery ML models are stored in BigQuery datasets like tables and views. When you create and use models in BigQuery ML, your charges are based on how much data is used to train a model and on the queries you run against the data.
To learn more about machine learning and BigQuery ML, see the:
- Applying machine learning to your data with GCP course at Coursera
- Data and machine learning training program
- Machine learning crash course
- Machine learning glossary
- To get started using BigQuery ML, see Getting started with BigQuery ML using the web UI.