Developers & Practitioners

Smile with new user-friendly SQL capabilities in BigQuery

Smile

October happens to be the month to celebrate World Smile Day when Harvey Ball, the inventor of the smiley face declared this day as such to give people a reason to smile. This month, BigQuery users have a lot of new reasons to smile about with the release of new user-friendly SQL capabilities now generally available. 

New table operations commands 

BigQuery users can now use table-level operations to evolve their schema and prepare tables for new data using the new commands.

ADD COLUMN: Adding new columns to tables in BigQuery was previously supported only via the console UI, API or bq cli. The ALTER TABLE command in BigQuery now supports the ADD COLUMN clause to add new columns to tables. Documentation

TRUNCATE TABLE: Deleting the entire contents of a table in BigQuery previously required you to run DELETE commands which incurred DML cost. Now, you can run the TRUNCATE command to delete all records of tables which runs as one of the many free operations supported by BigQuery. Documentation

Unicode table naming: Tables naming in BigQuery were previously restricted to a subset of the ASCII character set (letters, numbers and underscore). BigQuery now supports Unicode table names, which means in addition to having a CUSTOMER table, you can also have your customer data in tables named ग्राहक, お客様, 顾客, заказчик or الزبون in BigQuery. Documentation


Language: SQL

  -- Create the initial table employees on which you can add columns
-- Replace the "dataset" below with your entries
CREATE TABLE dataset.employés 
(
	employee_id INT64,
	manager_id INT64
);

-- Add new columns to the tables using ADD COLUMN clause
ALTER TABLE dataset.employés
ADD COLUMN last_name STRING,
ADD COLUMN date_of_birth DATE,
ADD COLUMN work_location GEOGRAPHY
;

-- Empty out the contents of the employee table to reload it
-- with data for the new columns
TRUNCATE TABLE dataset.employés
;

Reading from and writing to external storage

BigQuery provides users the ability to read data in external storage buckets. The new SQL commands allow users to configure tables within BigQuery pointing to these buckets as well as export data from queries into storage buckets.

CREATE OR REPLACE EXTERNAL TABLE: BigQuery allows you to query data in Google Cloud Storage (GCS) buckets by registering them as external tables within BigQuery. You can now run CREATE EXTERNAL TABLE commands to create a table definition via SQL within BigQuery for these external data sources. If your external data is partitioned using hive partitioning, you can specify the partitioning scheme using the optional hive partition URI prefix as a part of the DDL. Documentation

Language: SQL

  -- Create an external table automatically detecting table schema 
CREATE EXTERNAL TABLE IF NOT EXISTS dataset.whale_detection_data
OPTIONS (
uris=["gs://noaa-passive-bioacoustic/cornell_sperm_whale_detections/detection_data/GoMexSpermWhalePresence-2010*.csv"], 
  	format=CSV
);
-- Create an external table with explicit schema declaration. All the fields are set to data type STRING.
CREATE OR REPLACE EXTERNAL TABLE dataset.whale_detection_data (
  	Julian_Date STRING, 
  	Date STRING, 
  	Month STRING, 
  	Day STRING, 
  	Year STRING, 
  	Season STRING, 
  	Region STRING, 
  	Site_HF STRING, 
  	Hours_Presence STRING, 
  	DayHr STRING, 
  	Proportion_Presence STRING, 
  	CorrPres STRING, 
  	CorArcsin STRING
)
OPTIONS (
uris=["gs://noaa-passive-bioacoustic/cornell_sperm_whale_detections/detection_data/GoMexSpermWhalePresence-2010*.csv"], 
 	 format=CSV
);

DROP EXTERNAL TABLE: You can delete the table definition for external tables using the DROP EXTERNAL TABLE command. Note that this only removes the table definition within BigQuery and does not delete the data stored in the storage bucket. Also, the EXTERNAL prefix is required for dropping external tables. DROP TABLE table_name (without the EXTERNAL prefix) only works for regular BigQuery tables and will fail if attempted for external tables. Documentation

Language: SQL

  -- Drop an external table on storage bucket if it exists
DROP EXTERNAL TABLE IF EXISTS dataset.whale_detection_data;

EXPORT DATA: As a part of data pipeline operations, BigQuery users want to export the output of their analytical queries into external storage to share these results with other stakeholders or feed them into ongoing data pipelines, e.g. data science, data lakes. You can now use EXPORT DATA command within the console UI or in scripts to save the results of a query in a Google Cloud Storage bucket in any of the data formats and compression types supported by BigQuery. Documentation.

Language: SQL

  -- Export result of a query execution to GCS bucket

EXPORT DATA OPTIONS (
uri="gs://mybucket/myfile/*", 
  	format=CSV
) AS 
SELECT corpus, sum(word_count) as word_count 
FROM `bigquery-public-data.samples.shakespeare` 
GROUP BY corpus 
ORDER BY 2 DESC;

Scripting and SQL improvements

Building on the scripting and stored procedures SQL support, BigQuery is introducing Dynamic SQL, authorized user defined functions and SQL language improvements.

EXECUTE IMMEDIATE: Dynamic SQL is a powerful technique within SQL scripting by which you can use SQL to generate and execute SQL dynamically. EXECUTE IMMEDIATE accepts an expression which could be a combination of static strings and variables to produce a valid SQL statement on-the-fly which is then immediately executed. For security reasons, only a single SQL statement can be executed with each invocation of EXECUTE IMMEDIATE. Documentation

Language: SQL

  -- Example of Dynamic SQL
DECLARE year INT64 DEFAULT 2015;
EXECUTE IMMEDIATE format("""
SELECT COUNT(*)
FROM `bigquery-public-data`.new_york.tlc_yellow_trips_%d
""", year);

Authorized User-Defined Functions (UDFs): An authorized UDF is a user defined function that is authorized through the use of IAM permission to enable access to a particular dataset. The UDF can query tables in the dataset, even if the user who invokes the UDF does not have access to those tables. Documentation


Duplicate column names in query results: BigQuery would previously not allow two columns in the query results with the same name, which required the analyst to manually edit the query to look for and rename any duplicate columns. You can now run queries in BigQuery with the same column names and BigQuery will automatically add a number suffix (_n) to distinguish the duplicate columns from each other. Documentation

Language: SQL

  -- Example of duplicate column name
SELECT duration, durationMinutes/60 duration
FROM bigquery-public-data.baseball.games_post_wide
LIMIT 10;

New DATE functions

BigQuery is introducing new functions to perform date operations required for business processes.

DATE ± n (days): Arithmetic operations on DATEs, e.g. adding or subtracting dates, in BigQuery were supported previously using the DATE_ADD and DATE_SUB functions to add and subtracts days from a given date. You can now use simple arithmetic operators (+ or –) to add and subtract days, e.g. HIRE_DATE+7 or DELIVERY_DATE–2. Documentation

LAST_DAY: This function returns the last date of a time unit for a given date expression, e.g. the last date of the month or year that the date belongs to. Documentation

Language: SQL

  SELECT LAST_DAY(DATE '2008-11-25', MONTH)+1 AS first_day_next_month;

New STRING functions

BigQuery is also introducing new functions to extract and manipulate string data to enable search or text matching.

Concatenation operator (“||”) support: Concatenating strings in BigQuery used to require the use of the CONCAT function to combine two or more strings. You can now use ||, the concatenation operator, to combine multiple strings making it as easy as “A”||“B”||“C” to make “ABC”. Documentation

Language: SQL

  SELECT ‘a’ || ’b’ || ’c’  string_concatenation, [‘a’,’b’] || [‘c’] array_concatenation;

LEFT: The function, LEFT(value, length) returns the leftmost characters of a string value corresponding to the length specified in the function. Documentation

RIGHT: The function, RIGHT(value, length) returns the rightmost characters of a string value corresponding to the length specified in the function. Documentation

INSTR: The function, INSTR(source_value, search_value[, position[, occurrence]]), reports the location in the string of the specified occurence of a search value starting with specified position in a string. Documentation

INITCAP: The function, INITCAP(value[, delimiters]), renders the first letter of each word in string in uppercase while rendering all other characters in lowercase. The optional delimiter can be specified to override the standard punctuation characters used in sentences. Documentation

TRANSLATE: The function, TRANSLATE(expression, source_characters, target_characters), maps the source characters in the string to the target characters and returns the translated value. Documentation

ASCII: The function, ASCII(value), returns the ASCII code for the first character in specified string value. Documentation

UNICODE: The function, UNICODE(value), returns the Unicode code for the first character in specified string value. Documentation

CHR: The function, CHR(value), returns the Unicode character corresponding to the Unicode code point specified in the value. Documentation

SOUNDEX: The function, SOUNDEX(value), returns the Soundex code, which is the phonetic representation of a string. The Soundex code consists of a letter followed by 3 digits is used to help determine whether two strings have similar English-language pronunciation. Documentation

REGEXP_SUBSTR: The function, REGEXP_SUBSTR(value, regexp[, position[, occurrence]]), a synonym of the function REGEXP_EXTRACT, returns the substring in value that matches the regular expression, regexp. Documentation

REGEXP_EXTRACT: The existing function, REGEXP_EXTRACT(value, regexp[, position[, occurrence]]), has been enhanced to support position and occurence to specify the starting position from which the regular expression should be matched and the specified occurence to look for as additional optional arguments. Documentation

REGEXP_INSTR: The function, REGEXP_INSTR(source_value, regexp [, position[, occurrence, [occurrence_position]]]), returns the position of a specific instance of regexp in source_value. Documentation

OCTET_LENGTH: This function, OCTET_LENGTH (value), a synonym of the function BYTE_LENGTH, returns the length of the string value in bytes. Documentation

Language: SQL

  -- Examples of String functions
SELECT 
 REGEXP_INSTR('Hello Helloo and Hellooo', 'H?ello+', 1, 2),      
 INSTR('abc@google.com', '@google.com'),        
 REGEXP_EXTRACT('Hello Helloo and Hellooo', 'H?ello+', 1, 2),      
 REGEXP_SUBSTR('Hello Helloo and Hellooo', 'H?ello+', 1, 2),   
 SOUNDEX("terry"), SOUNDEX("teri"),            
 INITCAP("hello, this is a demo"),              
 TRANSLATE("this is a cookie", "co", "ku"),    
 ASCII("abc"),                             
 CHR(97),                                   
 UNICODE("学生"),                              
 LEFT("206-999-9999", 3),                      
 RIGHT("2000-08-24", 2),                      
 OCTET_LENGTH(b"абвгд");

Expanded INFORMATION_SCHEMA views

INFORMATION_SCHEMA, specified in the ANSI SQL standard, allows users to query the meta data about various objects or entities that contain or reference data, such as, tables or view or programs, such as stored procedures and user-defined functions.

Information schema for Tables: You can now look up table metadata through the TABLES, TABLE_OPTIONS, COLUMNS, and COLUMN_FIELD_PATHS views within INFORMATION_SCHEMA.

Language: SQL

  # Example where we get the latest schema for a wildcard query.
SELECT * FROM my_dataset.INFORMATION_SCHEMA.COLUMNS
WHERE table_name =
  (SELECT table_name
   FROM my_dataset.INFORMATION_SCHEMA.TABLES
   WHERE STARTS_WITH(table_name, 'my_prefix')
   ORDER BY creation_time DESC
   LIMIT 1)

Information schema for Views: You can use VIEWS within INFORMATION_SCHEMA to look up view metadata.

Information schema for Routines: Routine metadata is contained in the ROUTINES, ROUTINE_OPTIONS, and PARAMETERS views with INFORMATION_SCHEMA.

Information schema for Datasets: Dataset information is contained within the SCHEMATA and SCHEMATA_OPTIONS views within INFORMATION_SCHEMA. 

Language: SQL

  # Example combining INFORMATION_SCHEMA with dynamic SQL to query common
# schemas across a region.
DECLARE unions DEFAULT
  (SELECT STRING_AGG(
     "SELECT my_col FROM " || table_schema || "." || table_name,
     " UNION ALL ")
   FROM `region-us`.INFORMATION_SCHEMA.COLUMNS
   WHERE column_name = "my_col"
     AND data_type = "STRING");
EXECUTE IMMEDIATE "SELECT my_col FROM (" || unions || ")";

We hope this harvest of new SQL capabilities brings a smile on the faces of our BigQuery users. To get started with BigQuery and other data analytics products, access the free trial: https://cloud.google.com/free. You can also try BigQuery with no charge in our sandbox.