Pipe syntax

Pipe syntax is an extension to GoogleSQL that supports a linear query structure designed to make your queries easier to read, write, and maintain.

To enroll a project in the pipe syntax preview, fill out the BigQuery pipe syntax enrollment form.

Overview

You can use pipe syntax anywhere you write GoogleSQL. Pipe syntax supports the same operations as existing GoogleSQL syntax, or standard syntax—for instance, selection, aggregation and grouping, joining, and filtering—but the operations can be applied in any order, any number of times. The linear structure of pipe syntax lets you write queries so that the order of the query syntax matches the order of logical steps taken to build the result table.

Queries that use pipe syntax are priced, executed, and optimized the same way as their equivalent standard syntax queries. When you write queries with pipe syntax, follow the guidelines to estimate costs and optimize query computation.

Standard syntax suffers from issues that can make it difficult to read, write, and maintain. The following table shows how pipe syntax addresses these issues:

Standard syntax Pipe syntax
Clauses must appear in a particular order. Pipe operators can be applied in any order.
More complex queries, such as queries with multi-level aggregation, usually require CTEs or nested subqueries. More complex queries are usually expressed by adding pipe operators to the end of the query.
During aggregation, columns are repeated in the SELECT, GROUP BY, and ORDER BY clauses. Columns can be listed only once per aggregation.

Basic syntax

In pipe syntax, queries start with a standard SQL query or a FROM clause. For example, a standalone FROM clause, such as FROM mydataset.mytable, is valid pipe syntax. The result of the standard SQL query or the table from the FROM clause can then be passed as input to a pipe symbol, |>, followed by a pipe operator name and any arguments to that operator. The pipe operator transforms the table in some way, and the result of that transformation can be passed to another pipe operator.

You can use any number of pipe operators in your query to do things such as select, order, filter, join, or aggregate columns. The names of pipe operators match their standard syntax counterparts and generally have the same behavior. The main difference between standard syntax and pipe syntax is the way you structure your query. As the logic expressed by your query becomes more complex, the query can still be expressed as a linear sequence of pipe operators, without using deeply nested subqueries, making it easier to read and reason about.

Consider the following table:

CREATE TABLE mydataset.produce AS (
  SELECT 'apples' AS item, 2 AS sales, 'fruit' AS category
  UNION ALL
  SELECT 'apples' AS item, 7 AS sales, 'fruit' AS category
  UNION ALL
  SELECT 'carrots' AS item, 0 AS sales, 'vegetable' AS category
  UNION ALL
  SELECT 'bananas' AS item, 15 AS sales, 'fruit' AS category);

The following queries each contain valid pipe syntax that shows how you can build a query sequentially.

Queries can start with a FROM clause and don't need to contain a pipe symbol:

-- View the table
FROM mydataset.produce;

/*---------+-------+-----------+
 | item    | sales | category  |
 +---------+-------+-----------+
 | apples  | 7     | fruit     |
 | apples  | 2     | fruit     |
 | carrots | 0     | vegetable |
 | bananas | 15    | fruit     |
 +---------+-------+-----------*/

You can filter with a WHERE pipe operator:

-- Filter items with no sales
FROM mydataset.produce
|> WHERE sales > 0;

/*---------+-------+-----------+
 | item    | sales | category  |
 +---------+-------+-----------+
 | apples  | 7     | fruit     |
 | apples  | 2     | fruit     |
 | bananas | 15    | fruit     |
 +---------+-------+-----------*/

To perform aggregation, use the AGGREGATE pipe operator, followed by any number of aggregate functions, followed by a GROUP BY clause. The GROUP BY clause is part of the AGGREGATE pipe operator and is not separated by a pipe (|>).

-- Compute total sales by item
FROM mydataset.produce
|> WHERE sales > 0
|> AGGREGATE SUM(sales) AS total_sales, COUNT(*) AS num_sales
   GROUP BY item;

/*---------+-------------+-----------+
 | item    | total_sales | num_sales |
 +---------+-------------+-----------+
 | apples  | 9           | 2         |
 | bananas | 15          | 1         |
 +---------+-------------+-----------*/

Now suppose you have the following table that contains an ID for each item:

CREATE TABLE mydataset.item_data AS (
  SELECT "apples" AS item, "123" AS id
  UNION ALL
  SELECT "bananas" AS item, "456" AS id
  UNION ALL
  SELECT "carrots" AS item, "789" AS id
);

You can use the JOIN pipe operator to join the results of the previous query with this table to include each item's ID:

FROM mydataset.produce
|> WHERE sales > 0
|> AGGREGATE SUM(sales) AS total_sales, COUNT(*) AS num_sales
   GROUP BY item
|> JOIN mydataset.item_data USING(item);

/*---------+-------------+-----------+-----+
 | item    | total_sales | num_sales | id  |
 +---------+-------------+-----------+-----+
 | apples  | 9           | 2         | 123 |
 | bananas | 15          | 1         | 456 |
 +---------+-------------+-----------+-----*/

Pipe syntax has the following key characteristics:

  • Pipe operators can be applied in any order, any number of times.
  • Pipe syntax works anywhere standard syntax is supported: queries, views, table-valued functions, and other contexts.
  • Pipe syntax can be mixed with standard syntax in the same query. For example, subqueries can use different syntax from the parent query.
  • A pipe operator can see every alias that exists in the table preceding the pipe.

Key differences from standard syntax

Pipe syntax differs from standard syntax in the following ways:

  • Queries can start with a FROM clause.
  • The SELECT pipe operator doesn't perform aggregation. You must use the AGGREGATE pipe operator instead.
  • Filtering is always done with the WHERE pipe operator, which can be applied anywhere. The WHERE pipe operator, which replaces HAVING and QUALIFY, can filter the results of aggregation or window functions.

For more information and a complete list of pipe operators, see Pipe query syntax.

Use cases

Common use cases for pipe syntax include the following:

  • Ad-hoc analysis and incremental query building: The logical order of operations makes it easier to write and debug queries. The prefix of any query up to a pipe symbol |> is a valid query, which helps you view intermediate results in a long query. The productivity gains can speed up the development process across your organization.
  • Log analytics: There exist other types of pipe-like syntax that are popular among log analytics users. Pipe syntax provides a familiar structure that simplifies onboarding for those users to Log Analytics and BigQuery.

Additional features in pipe syntax

With few exceptions, pipe syntax supports all operators that standard syntax does with the same syntax. In addition, pipe syntax introduces the following pipe operators.

EXTEND pipe operator

The EXTEND pipe operator, which can only be used immediately after a pipe symbol, lets you append computed columns to the current table. The EXTEND pipe operator is similar to the SELECT *, new_column statement, but it gives you more flexibility in referencing column aliases.

Consider the following table that contains two test scores for each person:

CREATE TABLE mydataset.scores AS (
  SELECT 'Alex' AS student, 9 AS score1, 10 AS score2, 10 AS points_possible
  UNION ALL
  SELECT 'Dana' AS student, 5 AS score1, 7 AS score2, 10 AS points_possible);

/*---------+--------+--------+-----------------+
 | student | score1 | score2 | points_possible |
 +---------+--------+--------+-----------------+
 | Alex    | 9      | 10     | 10              |
 | Dana    | 5      | 7      | 10              |
 +---------+--------+--------+-----------------*/

Suppose you want to compute the average raw score and average percentage score that each student received on the test. In standard syntax, later columns in a SELECT statement don't have visibility to earlier aliases. To avoid a subquery, you have to repeat the expression for the average:

SELECT student,
  (score1 + score2) / 2 AS average_score,
  (score1 + score2) / 2 / points_possible AS average_percent
FROM mydataset.scores;

The EXTEND pipe operator can reference previously used aliases, making the query easier to read and less error prone:

FROM mydataset.scores
|> EXTEND (score1 + score2) / 2 AS average_score
|> EXTEND average_score / points_possible AS average_percent
|> SELECT student, average_score, average_percent;

/*---------+---------------+-----------------+
 | student | average_score | average_percent |
 +---------+---------------+-----------------+
 | Alex    | 9.5           | .95             |
 | Dana    | 6.0           | 0.6             |
 +---------+---------------+-----------------*/

SET pipe operator

The SET pipe operator, which can only be used immediately after a pipe symbol, lets you replace the value of columns in the current table. The SET pipe operator is similar to the SELECT * REPLACE (expression AS column) statement. You can reference the original value by qualifying the column name with a table alias.

FROM (SELECT 3 AS x, 5 AS y)
|> SET x = 2 * x;

/*---+---+
 | x | y |
 +---+---+
 | 6 | 5 |
 +---+---*/

DROP pipe operator

The DROP pipe operator, which can only be used immediately after a pipe symbol, lets you remove columns from the current table. The DROP pipe operator is similar to the SELECT * EXCEPT(column) statement. After a column is dropped you can still reference the original value by qualifying the column name with a table alias.

FROM (SELECT 1 AS x, 2 AS y) AS t
|> DROP x;

/*---+
 | y |
 +---+
 | 2 |
 +---*/

RENAME pipe operator

The RENAME pipe operator, which can only be used immediately after a pipe symbol, lets you rename columns from the current table. The RENAME pipe operator is similar to the SELECT * EXCEPT(old_column), old_column AS new_column statement.

FROM (SELECT 1 AS x, 2 AS y, 3 AS z) AS t
|> RENAME y AS w;

/*---+---+---+
 | x | w | z |
 +---+---+---+
 | 1 | 2 | 3 |
 +---+---+---*/

AGGREGATE pipe operator

To perform aggregation in pipe syntax, use the AGGREGATE pipe operator, followed by any number of aggregate functions, followed by a GROUP BY clause. You don't need to repeat columns in a SELECT clause.

The examples in this section use the produce table:

CREATE TABLE mydataset.produce AS (
  SELECT 'apples' AS item, 2 AS sales, 'fruit' AS category
  UNION ALL
  SELECT 'apples' AS item, 7 AS sales, 'fruit' AS category
  UNION ALL
  SELECT 'carrots' AS item, 0 AS sales, 'vegetable' AS category
  UNION ALL
  SELECT 'bananas' AS item, 15 AS sales, 'fruit' AS category);

/*---------+-------+-----------+
 | item    | sales | category  |
 +---------+-------+-----------+
 | apples  | 7     | fruit     |
 | apples  | 2     | fruit     |
 | carrots | 0     | vegetable |
 | bananas | 15    | fruit     |
 +---------+-------+-----------*/
FROM mydataset.produce
|> AGGREGATE SUM(sales) AS total, COUNT(*) AS num_records
   GROUP BY item, category;

/*---------+-----------+-------+-------------+
 | item    | category  | total | num_records |
 +---------+-----------+-------+-------------+
 | apples  | fruit     | 9     | 2           |
 | carrots | vegetable | 0     | 1           |
 | bananas | fruit     | 15    | 1           |
 +---------+-----------+-------+-------------*/

If you are ready to order your results immediately following aggregation, you can mark the columns in the GROUP BY clause that you want to order with ASC or DESC. Unmarked columns are not ordered.

If you want to order all columns, then you can replace the GROUP BY clause with a GROUP AND ORDER BY clause, which orders every column in ascending order by default. You can specify DESC following the columns that you want to order in descending order. For example, the following three queries are equivalent:

-- Use a separate ORDER BY clause
FROM mydataset.produce
|> AGGREGATE SUM(sales) AS total, COUNT(*) AS num_records
   GROUP BY category, item
|> ORDER BY category DESC, item;
-- Explicitly mark how to order columns in the GROUP BY clause
FROM mydataset.produce
|> AGGREGATE SUM(sales) AS total, COUNT(*) AS num_records
   GROUP BY category DESC, item ASC;
-- Only mark descending columns in the GROUP AND ORDER BY clause
FROM mydataset.produce
|> AGGREGATE SUM(sales) AS total, COUNT(*) AS num_records
   GROUP AND ORDER BY category DESC, item;

The advantage of using a GROUP AND ORDER BY clause is that you don't have to repeat column names in two places.

To perform full table aggregation, use GROUP BY() or omit the GROUP BY clause entirely:

FROM mydataset.produce
|> AGGREGATE SUM(sales) AS total, COUNT(*) AS num_records;

/*-------+-------------+
 | total | num_records |
 +-------+-------------+
 | 24    | 4           |
 +-------+-------------*/

JOIN pipe operator

The JOIN pipe operator lets you join the current table with another table and supports the standard join operations, including CROSS, INNER, LEFT, RIGHT, and FULL.

The following examples reference the produce and item_data tables:

CREATE TABLE mydataset.produce AS (
  SELECT 'apples' AS item, 2 AS sales, 'fruit' AS category
  UNION ALL
  SELECT 'apples' AS item, 7 AS sales, 'fruit' AS category
  UNION ALL
  SELECT 'carrots' AS item, 0 AS sales, 'vegetable' AS category
  UNION ALL
  SELECT 'bananas' AS item, 15 AS sales, 'fruit' AS category);
CREATE TABLE mydataset.item_data AS (
  SELECT "apples" AS item, "123" AS id
  UNION ALL
  SELECT "bananas" AS item, "456" AS id
  UNION ALL
  SELECT "carrots" AS item, "789" AS id
);

The following example uses a USING clause and avoids column ambiguity:

FROM `mydataset.produce`
|> JOIN `mydataset.item_data` USING(item)
|> WHERE item = "apples";

/*--------+-------+----------+-----+
 | item   | sales | category | id  |
 +--------+-------+----------+-----+
 | apples | 2     | fruit    | 123 |
 | apples | 7     | fruit    | 123 |
 +--------+-------+----------+-----*/

To reference columns in the current table, such as to disambiguate columns in an ON clause, you need to alias the current table by using the AS pipe operator. You can optionally alias the joined table. You can reference both aliases following subsequent pipe operators:

FROM `mydataset.produce`
|> AS produce_table
|> JOIN `mydataset.item_data` AS item_table
   ON produce_table.item = item_table.item
|> WHERE produce_table.item = "bananas"
|> SELECT item_table.item, sales, id;

/*---------+-------+-----+
 | item    | sales | id  |
 +---------+-------+-----+
 | bananas | 15    | 123 |
 +---------+-------+-----*/

The right-hand side of the join doesn't have visibility to the left-hand side of the join, which means you can't join the current table with itself. For example, the following query fails:

-- This query doesn't work.
FROM `mydataset.produce`
|> AS produce_table
|> JOIN produce_table AS produce_table_2 USING(item);

To perform a self-join with a modified table, you can use a common table expression inside of a WITH clause.

WITH cte_table AS (
  FROM `mydataset.produce`
  |> WHERE item = "carrots"
)
FROM cte_table
|> JOIN cte_table AS cte_table_2 USING(item);

Example

Consider the following table with information about customer orders:

CREATE TABLE mydataset.customer_orders AS (
  SELECT 1 AS customer_id, 100 AS order_id, "WA" AS state, 5 AS cost, "clothing" AS item_type
  UNION ALL
  SELECT 1 AS customer_id, 101 AS order_id, "WA" AS state, 20 AS cost, "clothing" AS item_type
  UNION ALL
  SELECT 1 AS customer_id, 102 AS order_id, "WA" AS state, 3 AS cost, "food" AS item_type
  UNION ALL
  SELECT 2 AS customer_id, 103 AS order_id, "NY" AS state, 16 AS cost, "clothing" AS item_type
  UNION ALL
  SELECT 2 AS customer_id, 104 AS order_id, "NY" AS state, 22 AS cost, "housewares" AS item_type
  UNION ALL
  SELECT 2 AS customer_id, 104 AS order_id, "WA" AS state, 45 AS cost, "clothing" AS item_type
  UNION ALL
  SELECT 3 AS customer_id, 105 AS order_id, "MI" AS state, 29 AS cost, "clothing" AS item_type);

Suppose you want to know, for each state and item type, the average amount spent by repeat customers. You could write the query in the following way:

SELECT state, item_type, AVG(total_cost) AS average
FROM
  (
    SELECT
      SUM(cost) AS total_cost,
      customer_id,
      state,
      item_type,
      COUNT(*) OVER (PARTITION BY customer_id) AS num_orders
    FROM mydataset.customer_orders
    GROUP BY customer_id, state, item_type
    QUALIFY num_orders > 1
  )
GROUP BY state, item_type
ORDER BY state DESC, item_type ASC;

If you read the query from top to bottom, you encounter the column total_cost before it has been defined. Even within the subquery, you read the names of columns before you see which table they come from.

To make sense of this query, it needs to be read from the inside out. The columns state and item_type are repeated numerous times in the SELECT and GROUP BY clauses, then again in the ORDER BY clause.

The following equivalent query is written using pipe syntax:

FROM mydataset.customer_orders
|> AGGREGATE SUM(cost) AS total_cost, GROUP BY customer_id, state, item_type
|> EXTEND COUNT(*) OVER (PARTITION BY customer_id) AS num_orders
|> WHERE num_orders > 1
|> AGGREGATE AVG(total_cost) AS average GROUP BY state DESC, item_type ASC;

/*-------+------------+---------+
 | state | item_type  | average |
 +-------+------------+---------+
 | WA    | clothing   | 35.0    |
 | WA    | food       | 3.0     |
 | NY    | clothing   | 16.0    |
 | NY    | housewares | 22.0    |
 +-------+------------+---------*/

With pipe syntax, you can write the query to follow the logical steps you might think through to solve the original problem. The lines of syntax in the query correspond to the following logical steps:

  • Start with the table of customer orders.
  • Find out how much each customer spent on each type of item by state.
  • Count the number of orders for each customer.
  • Restrict the results to repeat customers.
  • Find the average amount that repeat customers spend for each state and item type.

Limitations

  • You can't include a differential privacy clause in a SELECT statement following a pipe operator. Instead, use a differential privacy clause in standard syntax and apply pipe operators following the query.
  • You can't use a named window in pipe syntax.

What's next