Smile with new user-friendly SQL capabilities in BigQuery
Jagan R. Athreya
Group Product Manager, Spanner
October happens to be the month to celebrate World Smile Day when Harvey Ball, the inventor of the smiley face declared this day as such to give people a reason to smile. This month, BigQuery users have a lot of new reasons to smile about with the release of new user-friendly SQL capabilities now generally available.
New table operations commands
BigQuery users can now use table-level operations to evolve their schema and prepare tables for new data using the new commands.
ADD COLUMN: Adding new columns to tables in BigQuery was previously supported only via the console UI, API or bq cli. The ALTER TABLE command in BigQuery now supports the ADD COLUMN clause to add new columns to tables. Documentation
TRUNCATE TABLE: Deleting the entire contents of a table in BigQuery previously required you to run DELETE commands which incurred DML cost. Now, you can run the TRUNCATE command to delete all records of tables which runs as one of the many free operations supported by BigQuery. Documentation
Unicode table naming: Tables naming in BigQuery were previously restricted to a subset of the ASCII character set (letters, numbers and underscore). BigQuery now supports Unicode table names, which means in addition to having a CUSTOMER table, you can also have your customer data in tables named ग्राहक, お客様, 顾客, заказчик or الزبون in BigQuery. Documentation
Reading from and writing to external storage
BigQuery provides users the ability to read data in external storage buckets. The new SQL commands allow users to configure tables within BigQuery pointing to these buckets as well as export data from queries into storage buckets.
CREATE OR REPLACE EXTERNAL TABLE: BigQuery allows you to query data in Google Cloud Storage (GCS) buckets by registering them as external tables within BigQuery. You can now run CREATE EXTERNAL TABLE commands to create a table definition via SQL within BigQuery for these external data sources. If your external data is partitioned using hive partitioning, you can specify the partitioning scheme using the optional hive partition URI prefix as a part of the DDL. Documentation
DROP EXTERNAL TABLE: You can delete the table definition for external tables using the DROP EXTERNAL TABLE command. Note that this only removes the table definition within BigQuery and does not delete the data stored in the storage bucket. Also, the EXTERNAL prefix is required for dropping external tables. DROP TABLE table_name (without the EXTERNAL prefix) only works for regular BigQuery tables and will fail if attempted for external tables. Documentation
EXPORT DATA: As a part of data pipeline operations, BigQuery users want to export the output of their analytical queries into external storage to share these results with other stakeholders or feed them into ongoing data pipelines, e.g. data science, data lakes. You can now use EXPORT DATA command within the console UI or in scripts to save the results of a query in a Google Cloud Storage bucket in any of the data formats and compression types supported by BigQuery. Documentation.
Scripting and SQL improvements
Building on the scripting and stored procedures SQL support, BigQuery is introducing Dynamic SQL, authorized user defined functions and SQL language improvements.
EXECUTE IMMEDIATE: Dynamic SQL is a powerful technique within SQL scripting by which you can use SQL to generate and execute SQL dynamically. EXECUTE IMMEDIATE accepts an expression which could be a combination of static strings and variables to produce a valid SQL statement on-the-fly which is then immediately executed. For security reasons, only a single SQL statement can be executed with each invocation of EXECUTE IMMEDIATE. Documentation
Authorized User-Defined Functions (UDFs): An authorized UDF is a user defined function that is authorized through the use of IAM permission to enable access to a particular dataset. The UDF can query tables in the dataset, even if the user who invokes the UDF does not have access to those tables. Documentation
Duplicate column names in query results: BigQuery would previously not allow two columns in the query results with the same name, which required the analyst to manually edit the query to look for and rename any duplicate columns. You can now run queries in BigQuery with the same column names and BigQuery will automatically add a number suffix (_n) to distinguish the duplicate columns from each other. Documentation
New DATE functions
BigQuery is introducing new functions to perform date operations required for business processes.
DATE ± n (days): Arithmetic operations on DATEs, e.g. adding or subtracting dates, in BigQuery were supported previously using the DATE_ADD and DATE_SUB functions to add and subtracts days from a given date. You can now use simple arithmetic operators (+ or –) to add and subtract days, e.g. HIRE_DATE+7 or DELIVERY_DATE–2. Documentation
LAST_DAY: This function returns the last date of a time unit for a given date expression, e.g. the last date of the month or year that the date belongs to. Documentation
New STRING functions
BigQuery is also introducing new functions to extract and manipulate string data to enable search or text matching.
Concatenation operator (“||”) support: Concatenating strings in BigQuery used to require the use of the CONCAT function to combine two or more strings. You can now use ||, the concatenation operator, to combine multiple strings making it as easy as “A”||“B”||“C” to make “ABC”. Documentation
LEFT: The function, LEFT(value, length) returns the leftmost characters of a string value corresponding to the length specified in the function. Documentation
RIGHT: The function, RIGHT(value, length) returns the rightmost characters of a string value corresponding to the length specified in the function. Documentation
INSTR: The function, INSTR(source_value, search_value[, position[, occurrence]]), reports the location in the string of the specified occurence of a search value starting with specified position in a string. Documentation
INITCAP: The function, INITCAP(value[, delimiters]), renders the first letter of each word in string in uppercase while rendering all other characters in lowercase. The optional delimiter can be specified to override the standard punctuation characters used in sentences. Documentation
TRANSLATE: The function, TRANSLATE(expression, source_characters, target_characters), maps the source characters in the string to the target characters and returns the translated value. Documentation
ASCII: The function, ASCII(value), returns the ASCII code for the first character in specified string value. Documentation
UNICODE: The function, UNICODE(value), returns the Unicode code for the first character in specified string value. Documentation
CHR: The function, CHR(value), returns the Unicode character corresponding to the Unicode code point specified in the value. Documentation
SOUNDEX: The function, SOUNDEX(value), returns the Soundex code, which is the phonetic representation of a string. The Soundex code consists of a letter followed by 3 digits is used to help determine whether two strings have similar English-language pronunciation. Documentation
REGEXP_SUBSTR: The function, REGEXP_SUBSTR(value, regexp[, position[, occurrence]]), a synonym of the function REGEXP_EXTRACT, returns the substring in value that matches the regular expression, regexp. Documentation
REGEXP_EXTRACT: The existing function, REGEXP_EXTRACT(value, regexp[, position[, occurrence]]), has been enhanced to support position and occurence to specify the starting position from which the regular expression should be matched and the specified occurence to look for as additional optional arguments. Documentation
REGEXP_INSTR: The function, REGEXP_INSTR(source_value, regexp [, position[, occurrence, [occurrence_position]]]), returns the position of a specific instance of regexp in source_value. Documentation
OCTET_LENGTH: This function, OCTET_LENGTH (value), a synonym of the function BYTE_LENGTH, returns the length of the string value in bytes. Documentation
Expanded INFORMATION_SCHEMA views
INFORMATION_SCHEMA, specified in the ANSI SQL standard, allows users to query the meta data about various objects or entities that contain or reference data, such as, tables or view or programs, such as stored procedures and user-defined functions.
Information schema for Tables: You can now look up table metadata through the TABLES, TABLE_OPTIONS, COLUMNS, and COLUMN_FIELD_PATHS views within INFORMATION_SCHEMA.
Information schema for Views: You can use VIEWS within INFORMATION_SCHEMA to look up view metadata.
Information schema for Routines: Routine metadata is contained in the ROUTINES, ROUTINE_OPTIONS, and PARAMETERS views with INFORMATION_SCHEMA.
Information schema for Datasets: Dataset information is contained within the SCHEMATA and SCHEMATA_OPTIONS views within INFORMATION_SCHEMA.
We hope this harvest of new SQL capabilities brings a smile on the faces of our BigQuery users. To get started with BigQuery and other data analytics products, access the free trial: https://cloud.google.com/free. You can also try BigQuery with no charge in our sandbox.