BigQuery ML enables users to create and execute machine learning models in BigQuery using standard SQL queries. BigQuery ML democratizes machine learning by enabling SQL practitioners to build models using existing SQL tools and skills. BigQuery ML increases development speed by eliminating the need to move data.
BigQuery ML currently supports the following types of models:
- Linear regression — These models can be used for predicting a numerical value.
- Binary logistic regression — These models can be used for predicting one of two classes (such as identifying whether an email is spam).
- Multiclass logistic regression for classification — These models can be used to predict more than two classes such as whether an input is "low-value", "medium-value", or "high-value".
BigQuery ML functionality is available by using:
- The BigQuery web UI
- The BigQuery REST API
- An external tool such as a Jupyter notebook or business intelligence platform
Machine learning on large data sets requires extensive programming and knowledge of ML frameworks. These requirements restrict solution development to a very small set of people within each company, and they exclude data analysts who understand the data but have limited machine learning knowledge and programming expertise.
BigQuery ML empowers data analysts to use machine learning through existing SQL tools and skills. Analysts can use BigQuery ML to build and evaluate ML models in BigQuery. Analysts no longer need to export small amounts of data to a spreadsheets or other applications, and analysts no longer need to wait for limited resources from a data science team.
Advantages of BigQuery ML
BigQuery ML has the following advantages over other approaches to using ML with a cloud-based data warehouse:
- BigQuery ML democratizes the use of ML by empowering data analysts, the primary data warehouse users, to build and run models using existing business intelligence tools and spreadsheets. This enables business decision making through predictive analytics across the organization.
- There is no need to program an ML solution using Python or Java. Models are trained and accessed in BigQuery using SQL — a language data analysts know.
BigQuery ML increases the speed of model development and innovation by removing the need to export data from the data warehouse. Instead, BigQuery ML brings ML to the data. Exporting and re-formatting the data:
- Increases complexity — Multiple tools are required.
- Reduces speed — Moving and formatting large amounts data for Python-based ML frameworks takes longer than model training in BigQuery.
- Requires multiple steps to export data from the warehouse, restricting the ability to experiment on your data.
- Can be prevented by legal restrictions (such as HIPAA guidelines).
Like BigQuery, BigQuery ML is a multi-regional resource. BigQuery ML supports the same regions as BigQuery.
Data locality is specified when you create a dataset to store your BigQuery ML models and training data. BigQuery ML processes and stages data in the same location as the target dataset.
For more information on all quotas and limits, see Quotas and Limits.
BigQuery ML models are stored in BigQuery datasets like tables and views. When you create and use models in BigQuery ML, your charges are based on how much data is used to train a model and on the queries you run against the data.
To learn more about machine learning and BigQuery ML, see the:
- Applying machine learning to your data with GCP course at Coursera
- Data and machine learning training program
- Machine learning crash course
- Machine learning glossary
- To get started using BigQuery ML, see Getting started with BigQuery ML for data analysts or Getting started with BigQuery ML for data scientists.