What’s happening in BigQuery: integrated machine learning, maps, and more
In this month’s installment of What’s Happening in BigQuery, we’re sharing new features intended to make your life easier: some make BigQuery more performant and more cost effective, while others, like BigQuery ML, enable groundbreaking analysis tools in a cloud data warehouse that’s a first of its kind.
First off, we just finished Next ‘18, our annual event focused on all things cloud. For a video whirlwind tour of all of the data-oriented products and features we announced, check out our data analytics spotlight session.
BigQuery ML (beta)
BigQuery ML enables users to create and execute machine learning models in BigQuery using standard SQL queries. These extensions to our data warehouse platform democratize machine learning by enabling SQL practitioners to build models using existing SQL tools and skills. This new toolset also increases development speed by eliminating the need to move data. BigQuery is the first cloud data warehouse to support ML natively, in SQL, with no movement of data necessary. It’s possible to do feature engineering, creating models, and run predictions all from the same interface. To learn more, read our documentation or watch our breakout session from Next ‘18.
BigQuery GIS (public alpha)
BigQuery has another “first” in this edition of What’s Happening in BigQuery: the launch of BigQuery Geospatial Information Systems (BQ GIS for short). BigQuery is the first cloud MPP (Massively Parallel Processing) data warehouse to support geospatial data types and functions. BigQuery GIS also includes BQ Geo Viz, an open source tool to quickly visualize and style your query results on a map.
BQ GIS and BQ Geo Viz are both currently in public alpha. To request access, fill out this form.
We now support the following geographic types: Point, Linestring, Polygon, Multi-polygon, and Collections. Functions will look and feel very familiar to users of PostGIS as they follow the SQL/MM Spatial standard.
BigQuery clustering (beta)
To expand the partitioning functionality in BigQuery, we launched clustering. Clustering data within a partition allows users to sort data by up to four additional columns. The types supported are now:
NUMERIC. This technique is particularly useful for columns with very high cardinality. The real benefit of clustering, though, is faster search through your dataset, resulting in faster query times and reduced total billings.
BigQuery now supports using a sort property to eliminate scans of blocks of data within the table while evaluating queries that contain filters on the clustering columns. Further, by co-locating rows with similar values for those columns, clustering greatly improves performance of queries that aggregate data based on the clustering columns. Using partitioning and clustering together can drastically reduce the amount of data processed in a query which has the combined benefits of improving performance while reducing costs.
For more information, and samples, check out this 3-part series in our documentation:
For a live demo of the benefits of partitioning and clustering over a petabyte-sized table, watch this video.
(For the explanation of clustering, skip ahead to +10:00 minutes. For the demo, jump to +13:00 minutes.)
BigQuery Data Transfer Service
Part of BigQuery’s mission is to make it easier to move your data to the cloud, so you can start deriving insights. The BigQuery Data Transfer Service automates data movement from third-party SaaS applications to BigQuery as a fully managed data pipeline. This month, we launched beta support for a new service, Google Play, and the general availability of two existing services: Google Ads (formerly Google AdWords) and YouTube.
Already using BigQuery Data Transfer Service? We’d love to hear your feedback. Let us know by filling this short short survey.
BigQuery’s new user interface (UI) is now in beta
The new Cloud Console UI brings a number of improvements over the classic UI:
The BigQuery UI is now integrated with the Cloud Console experience (multi-login support, localized in the 18 supported languages, Cloud Storage location picker, and a UI that is now more consistent with other Google Cloud products).
Simpler project switching and project pinning in the left sidebar.
Standard SQL is the default language, and we now offer a client-side SQL parser.
Easy export of tables and query results to Data Studio Explorer.
Effective and easy data discovery with search across tables, datasets, and projects.
If you haven’t seen the new UI yet, you can try it out by clicking the blue button in the top right of your console window.
DDL (GA)Data definition language (DDL) statements allow you to create and modify tables and views using standard SQL query syntax. Now that DDL is generally available (GA), you can use
CREATE TABLE(as in the example below) and other statements in your production workflows.You can learn more by reading the DDL documentation.
CREATE TABLE mydataset.myclusteredtable
PARTITION BY DATE(timestamp)
description="a table clustered by customer_id"
SELECT timestamp, customer_id, transaction_amount
To learn more about these and other updates to BigQuery, subscribe to our release notes. You can also join us at our upcoming Next ’18 events in London and Tokyo, or sign up for our upcoming Cloud OnAir webinar on BigQuery. Our goal is to give you several ways to stay informed, but feel free to let us know if we can serve you better.