Stay organized with collections Save and categorize content based on your preferences.

What is BigQuery ML?

BigQuery ML lets you create and execute machine learning models using GoogleSQL queries. BigQuery ML democratizes machine learning by letting SQL practitioners build models using existing SQL tools and skills. BigQuery ML increases development speed by eliminating the need to move data.

BigQuery ML functionality is available by using:

  • The Google Cloud console
  • The bq command-line tool
  • The BigQuery REST API
  • An external tool such as a Jupyter notebook or business intelligence platform

Machine learning on large datasets requires extensive programming and knowledge of ML frameworks. These requirements restrict solution development to a very small set of people within each company, and they exclude data analysts who understand the data but have limited machine learning knowledge and programming expertise.

BigQuery ML empowers data analysts to use machine learning through existing SQL tools and skills. Analysts can use BigQuery ML to build and evaluate ML models in BigQuery. Analysts don't need to export small amounts of data to spreadsheets or other applications or wait for limited resources from a data science team.

Supported models

A model in BigQuery ML represents what an ML system has learned from the training data.

BigQuery ML supports the following types of models:

  • Linear regression for forecasting; for example, the sales of an item on a given day. Labels are real-valued (they cannot be +/- infinity or NaN).
  • Binary logistic regression for classification; for example, determining whether a customer will make a purchase. Labels must only have two possible values.
  • Multiclass logistic regression for classification. These models can be used to predict multiple possible values such as whether an input is "low-value," "medium-value," or "high-value." Labels can have up to 50 unique values. In BigQuery ML, multiclass logistic regression training uses a multinomial classifier with a cross-entropy loss function.
  • K-means clustering for data segmentation; for example, identifying customer segments. K-means is an unsupervised learning technique, so model training does not require labels nor split data for training or evaluation.
  • Matrix Factorization for creating product recommendation systems. You can create product recommendations using historical customer behavior, transactions, and product ratings and then use those recommendations for personalized customer experiences.
  • Time series for performing time-series forecasts. You can use this feature to create millions of time series models and use them for forecasting. The model automatically handles anomalies, seasonality, and holidays.
  • Boosted Tree for creating XGBoost based classification and regression models.
  • Deep Neural Network (DNN) for creating TensorFlow-based Deep Neural Networks for classification and regression models.
  • Vertex AI AutoML Tables to perform machine learning with tabular data using simple processes and interfaces.
  • TensorFlow model importing. This feature lets you create BigQuery ML models from previously trained TensorFlow models, then perform prediction in BigQuery ML.
  • Autoencoder for creating TensorFlow-based BigQuery ML models with the support of sparse data representations. The models can be used in BigQuery ML for tasks such as unsupervised anomaly detection and non-linear dimensionality reduction.

In BigQuery ML, you can use a model with data from multiple BigQuery datasets for training and for prediction.

Model selection guide

Diagram to help you choose an ML model for your task Download cheatsheet

Advantages of BigQuery ML

BigQuery ML has the following advantages over other approaches to using ML with a cloud-based data warehouse:

  • BigQuery ML democratizes the use of ML by empowering data analysts, the primary data warehouse users, to build and run models using existing business intelligence tools and spreadsheets. Predictive analytics can guide business decision-making across the organization.
  • There is no need to program an ML solution using Python or Java. Models are trained and accessed in BigQuery using SQL — a language familiar to data analysts.

  • BigQuery ML increases the speed of model development and innovation by removing the need to export data from the data warehouse. Instead, BigQuery ML brings ML to the data. BigQuery ML has the following advantages over exporting and reformatting data:

    • Reduces complexity because fewer tools are required
    • Increases speed to production because moving and formatting large amounts data for Python-based ML frameworks is not required for model training in BigQuery.

    For more information, watch the video How to accelerate machine learning development with BigQuery ML.

BigQuery ML and Vertex AI

BigQuery ML integrates with Vertex AI, Google Cloud's end-to-end AI/ML platform. When you register your BigQuery ML models to Vertex AI Model Registry, you can deploy these models to endpoints for online prediction.

For more information, watch the video How to simplify AI models with Vertex AI and BigQuery ML.

Supported regions

BigQuery ML is supported in the same regions as BigQuery. For more information, see BigQuery ML locations.

Pricing

BigQuery ML models are stored in BigQuery datasets like tables and views. For information about BigQuery ML pricing, see BigQuery ML pricing.

For information about BigQuery storage pricing, see Storage pricing. For information about BigQuery ML query pricing, see Query pricing.

Quotas

In addition to BigQuery ML-specific limits, queries that use BigQuery ML functions and CREATE MODEL statements are subject to the quotas and limits on BigQuery query jobs.

Limitations

What's next