Developers & Practitioners

Have a cool summer with BigQuery user-friendly SQL

May 24, 2021

Dan Peng

Product Manager, Google Cloud

Jagan R. Athreya

Product Manager, Google Cloud

https://storage.googleapis.com/gweb-cloudblog-publish/images/AdobeStock_276669603_resized.max-1400x1400.jpg

With summer just around the corner, things are really heating up. But you’re in luck because this month BigQuery is supplying a cooler full of ice cold refreshments with this release of user-friendly SQL capabilities.

We are pleased to announce three categories of BigQuery user-friendly SQL launches: Powerful Analytics Features, Flexible Schema Handling, and New Geospatial Tools.

Powerful Analytics Features

These powerful SQL analytics features provide greater flexibility to analysts for organizing, filtering, and rendering data in BigQuery than ever before. You can enable spreadsheet-like functionality on summarized data using PIVOT and UNPIVOT and filter irrelevant data in analytic functions using QUALIFY.

Through this section, we will become familiar with these new features through examples using the BigQuery Public dataset, usa_names.

PIVOT/UNPIVOT (Preview)

One of the most time-consuming tasks for data analytics practitioners is wrangling data into the right shape. SQL is great for wrangling data, but sometimes you want to reformat a table as you would in a spreadsheet, pivoting rows and columns interchangeably. To support this use case, we are pleased to introduce PIVOT and UNPIVOT operators in BigQuery. PIVOT creates columns from unique values in rows by aggregating values, and UNPIVOT reverses this action.

The example below uses PIVOT on bigquery-public-data.usa_names.usa_1910_current to show the number of males and females born each year, representing each gender as a column. Then UNPIVOT reverses this action.

lang-sql

-- we start with SQL to create a simple table
-- we only include gender, year, and number. 
 
CREATE TABLE
  mydataset.sampletable1 AS (
  SELECT
    Gender,Year,SUM(Number) AS Number
  FROM
    `bigquery-public-data.usa_names.usa_1910_current`
  WHERE
    Year >= 2017
  GROUP BY
    Gender, Year);
 
 
-- The resulting table:
--+----------------------------------------+
--|   Gender   |    Year    |    Number    |
--+----------------------------------------+
--|      F     |    2019    |    1353716   |
--|      F     |    2017    |    1403989   |
--|      F     |    2018    |    1380382   |
--|      M     |    2018    |    1568678   |
--|      M     |    2019    |    1538056   |
--|      M     |    2017    |    1604609   |
--+----------------------------------------+
 
-- use PIVOT to create columns for “female” and “male” 
 
CREATE TABLE
  mydataset.Pivoted AS
SELECT
  year, male, female
FROM
  mydataset.sampletable1 
PIVOT( SUM(Number) FOR gender IN ('M' AS male,
  'F' AS female))
ORDER BY
  year;
 
-- The resulting pivoted table:
--+----------------------------------------+
--|    Year    |   female   |     male     |
--+----------------------------------------+
--|    2017    |   1403989   |    1604609  |
--|    2018    |   1380382   |    1568678  |
--|    2019    |   1353716   |    1538056  |
--+----------------------------------------+

-- UNPIVOT reverses the row/column rotation of PIVOT.

SELECT
  *
FROM
  mydataset.Pivoted 
UNPIVOT(number FOR gender IN (male AS 'M',
  female AS 'F'));

QUALIFY (Preview)

More advanced users of SQL know the power of analytic functions (aka window functions). These functions compute values over a group of rows, returning a single result for each row. For example, customers use analytic functions to compute a grand total, subtotal, moving average, rank, and more. With the announcement of support for QUALIFY, BigQuery users can now filter on the results of analytic functions by using the QUALIFY clause.

QUALIFY belongs in the family of query clauses used for filtering along with WHERE and HAVING. The WHERE clause is used to filter individual rows in a query. The HAVING clause is used to filter aggregate rows in a result set after aggregate functions and GROUP BY clauses. The QUALIFY clause is used to filter results of analytic functions.

To show the utility of QUALIFY, the example below uses QUALIFY to return the top 3 female names from each year in the last decade using from bigquery-public-data.usa_names.usa_1910_current.

lang-sql

Flexible Schema Handling

New SQL for administrators and data engineers enables table renaming for data pipeline processes, as well as flexible column management.

Table Rename (GA)

In data pipeline processes, tables are often created and then renamed so that they can make way for the next iteration of the pipeline run. To accomplish this, customers need a mechanism by which they can create a table and then subsequently rename it. Now if customers want to change this name using SQL, they can. Using the simple syntax that ALTER TABLE RENAME TO provides, customers will be able to rename a table after creation to clear the way for the next iteration of tables in the data pipeline.

lang-sql

DROP NOT NULL constraints on a column (GA)

While BigQuery has historically provided many tools available in the UI, CLI and APIs, we know that many administrators prefer interfacing with the database using SQL. BigQuery recently released DDL statements which enable data administrators to provision and manage datasets and tables, greatly simplifying provisioning and management. Today, we continue the next addition in this line of releases by announcing ALTER COLUMN DROP NOT NULL constraint on a column:

ALTER COLUMN DROP NOT NULL allows the administrator to remove the NOT NULL constraint from a column in BigQuery.

lang-sql

CREATE VIEW with column list (GA)

Views are used ubiquitously by BigQuery customers to capture business logic. Oftentimes, BigQuery users have business requirements to assign aliases to columns in views. Now BigQuery supports doing so upon view creation in a column name list format with the release of CREATE VIEW with column list syntax.

lang-sql

New Geospatial Tools

ST_POINTN, ST_STARTPOINT, and ST_ENDPOINT

Geospatial data is incredibly valuable to data analytics customers dealing with data from the physical world. BigQuery has very strong geospatial function support to help customers process marketing data, track storms, or manage self-driving cars. Particularly for analyzing vehicle or location tracking data, we’re thrilled to provide three new functions to allow users to easily extract or filter on key points:

For example, when working with vehicle histories, ST_POINTN, ST_STARTPOINT, and ST_ENDPOINT allow users to extract elements such as the start and the end of a trip. For identifying origin-destination pairs these functions will make that task much easier.

lang-sql

As sure as a hot summer day pairs well with an ice-cold beverage, these new user-friendly SQL features in BigQuery pair well with your data analytics workflows. To learn more about BigQuery, visit our website, and get started immediately with the free BigQuery Sandbox.

Posted in