What’s happening in BigQuery: Efficient new views and Cloud AI integrations
BigQuery, Google Cloud’s petabyte-scale data warehouse, lets you ingest and analyze data quickly and with high availability, so you can find new insights, trends, and predictions to efficiently run your business. Our engineering team is continually making improvements to BigQuery so you can get even more out of it. Recently added BigQuery features include new materialized views, column-level security, and BigQuery ML additions.
Read on to learn more about these new capabilities and how they can help you speed up queries, add access controls, and focus on innovation.
Accelerate performance and improve cost savings with BigQuery materialized views
We’re happy to announce that BigQuery materialized views are now available in beta. BigQuery materialized views are precomputed views that periodically cache the results of a query in a BigQuery table for increased performance and efficiency. A materialized view is a database object that contains the results of a query. For example, it may be a subset of the rows and/or columns of a table or join result, or may be a summary using an aggregate function. So when a query is re-run and a materialized view exists, the query doesn’t need to rescan all the tables again, but can quickly report the answer from the materialized view. This significantly improves performance and cuts down cost because the amount of data scanned is much smaller.
BigQuery materialized views are easy to set up and work with in real time. Here’s an overview of the benefits:
Zero maintenance: The system automatically synchronizes data refreshes with data changes in base tables. All incremental data changes from the base tables are automatically added to the materialized views—no user inputs required.
Always fresh: Materialized views are always consistent with the base table (including BigQuery streaming tables). Materialized views unite their data with the delta changes in the base table and return any new data in real time.
Self-tuning: If a query or part of a query against the source table can instead be resolved by querying the materialized views, BigQuery will rewrite (reroute) the query to use the materialized view for better performance and efficiency.
Customers such as Viant have seen tremendous benefits. "We have been using BigQuery Materialized Views in production for more than a year now,” says Adrian Witas, SVP and chief architect at Viant. “Not only does it come with great cost reduction, but it also hugely improves performance. Query latency is critical in our case where reporting data is directly consumed by the UI, which processes about 8,000 SQL queries per day with each query needing to complete in under a second. This has allowed us to successfully migrate our Vertica Reporting cluster to BigQuery."
Learn more about BigQuery materialized views in the documentation.
Use BigQuery ML models for online prediction and build recommendation models
You can export models from BigQuery ML in TensorFlow SavedModel format, and use those for online prediction in Cloud AI Platform, or your own serving layer. This also enables data scientists to tune the model in Python after it has been created in BigQuery ML, when further tuning is desired. Importing TensorFlow models for batch prediction in BigQuery is already generally available. This ability enables data scientists, data analysts, ML engineers, and data engineers to easily build end-to-end data to model deployment workflows. See the BigQuery ML documentation for more details.
In addition, BigQuery ML has added support for training matrix factorization, a new type of model, which is now in beta. Matrix factorization allows users to train recommendation systems on large datasets in BigQuery through SQL. To learn more, see the tutorials for implicit recommendations (user behavior-based; i.e.,used for product recommendations) and explicit recommendations (rating-based; i.e., app and movie recommendations).
Set access controls on data classes with column-level security
Sensitive data is often scattered alongside less sensitive data within the same dataset, and managing the appropriate access to that sensitive data becomes challenging if you only look at table, dataset, or project-level permissions. BigQuery column-level security, now in beta, lets you set access controls on data classes, abstracted by policy tags at column-level granularity. With this new capability, you can tag sensitive columns containing a protected data class (i.e., PII, financial, health) and restrict these columns to privileged groups.
Furthermore, you can create policy tag hierarchies, wherein tags in the root nodes aggregate permissions for tags at the leaf nodes. For example, if a “patient health” tag is nested below a general “restricted” tag, anyone who can access restricted data can access patient health data.
Price predictability with BigQuery Reservations
Price predictability continues to be top of mind for many organizations looking to understand and manage their data warehousing spend. We announced the beta release of BigQuery Reservations to help customers take advantage of BigQuery flat-rate pricing in an easy and flexible way. Advanced enterprise users can use Reservations to facilitate complex workload management scenarios. Reservations is now generally available in all BigQuery regions.
Command and control with BigQuery scripting and stored procedures
Scripting allows data engineers and data analysts to execute a wide range of tasks, from simple ones like running queries in a sequence to complex, multi-step tasks with control flow including IF statements and WHILE loops. Stored procedures allow you to save these scripts within BigQuery and share them so that any user can run them in the future.
We’re announcing general availability of scripting and stored procedures in all BigQuery regions. Since the beta release, we have added the following new capabilities:
Exception handling: Support for EXCEPTION clauses to handle errors generated during the script execution
System variable support: Support for key variables such as time zone, current project, and job ID, to display information specific to the user during execution.
Validation of the PROCEDURE body during creation: Validation of the commands in the body of the PROCEDURE during creation to detect problems before execution
JDBC and ODBC support: Support for standard APIs to enable execution by third-party execution engines that rely on the JDBC and ODBC drivers.
Read more about scripting and stored procedures in the BigQuery documentation.
In case you missed it:
Launched COVID-19 public dataset program, making a hosted repository of public datasets like Johns Hopkins Center for Systems Science and Engineering (JHU CSSE), the Global Health Data from the World Bank, and OpenStreetMap data free to access and query.
Federated ORC and Parquet federated queries in Cloud Storage are now generally available.
Geospatial data ingest: Launched integration of FME and BigQuery so that users can transform hundreds of different geospatial file types and projections directly into BigQuery tables.
In addition, check out the BigQuery Trial Slots promotional offer for new and returning BigQuery customers. This lets you purchase 500 slots for $500 per month for six months (a 95% discount from current monthly pricing). This limited time offer is subject to available capacity and qualification criteria and while supplies last. Learn more here. To express interest in this promotion, fill out this form and we'll be in touch with the next steps.