Working with JSON data in Standard SQL

BigQuery natively supports JSON data using the JSON data type.

This document describes how to create a table with a JSON column, insert JSON data into a BigQuery table, and query JSON data.

Overview

JSON is a widely used format that allows for semi-structured data, because it does not require a schema. Applications can use a "schema-on-read" approach, where the application ingests the data and then queries based on assumptions about the schema of that data. This approach differs from the STRUCT type in BigQuery, which requires a fixed schema that is enforced for all values stored in a column of STRUCT type.

By using the JSON data type, you can ingest semi-structured JSON into BigQuery without providing a schema for the JSON data upfront. This lets you store and query data that doesn’t always adhere to fixed schemas and data types. By ingesting JSON data as a JSON data type, BigQuery can encode and process each JSON field individually. You can then query the values of fields and array elements within the JSON data by using the field access operator, which makes JSON queries easy to use and cost efficient.

Limitations

  • If you use a batch load job to ingest JSON data into a table, the source data must be in CSV format. Other batch load formats are not supported.
  • You can't define JSON data type columns on external tables.
  • The JSON data type has a nesting limit of 500.
  • You can't use legacy SQL to query a table that contains JSON types.
  • In the Google Cloud Console, the SQL editor does not support auto-completion of JSON-related keywords.
  • Google Data Studio does not support tables that contain JSON types.

Create a table with a JSON column

You can create an empty table with a JSON column by using SQL or by using the bq command-line tool.

SQL

In Standard SQL, use the CREATE TABLE statement and declare a column with the JSON type.

CREATE TABLE mydataset.table1(
  id INT64,
  cart JSON
);

bq

Use the bq mk command and provide a table schema with a JSON data type.

bq mk --table mydataset.table1 id:INT64,cart:JSON

You can't partition or cluster a table on JSON columns, because the equality and comparison operators are not defined on the JSON type.

Create JSON values

You can create JSON values in the following ways:

  • Use SQL to create a JSON literal.
  • Use the PARSE_JSON function to convert a string to a JSON type.
  • Use the TO_JSON function to convert a SQL type to a JSON type.

Create a JSON literal

The following example uses a DML statement to insert a JSON literal into a table:

INSERT INTO mydataset.table1
VALUES(1, JSON '{"name": "Alice", "age": 30}');

Convert a string to JSON

The following example converts JSON data stored as a string to a JSON type, by using the PARSE_JSON function. The example converts a column from an existing table to a JSON type and stores the results to a new table.

CREATE OR REPLACE TABLE mydataset.table_new AS
SELECT id, SAFE.PARSE_JSON(cart) AS cart_json
FROM mydataset.old_table;

The SAFE prefix used in this example ensures that any conversion errors are returned as NULL values.

Convert a SQL type to JSON

The following example converts a SQL STRUCT value to a JSON type, by using the TO_JSON function:

SELECT TO_JSON(STRUCT(1 AS id, [10,20] AS coordinates));

| pt                             |
|--------------------------------|
| {"coordinates":[10,20],"id":1} |

Ingest JSON data

You can ingest JSON data into a BigQuery table in the following ways:

Use a batch load job

You can use a batch load job to ingest JSON data from a CSV file.

The following example assumes that you have a CSV file named file1.csv that contains the following records:

1,20
2,"""This is a string"""
3,"{""id"": 10, ""name"": ""Alice""}"

To load this file using the bq command-line tool, use the bq load command:

bq load --source_format=CSV mydataset.table1 json.csv id:INTEGER,json_data:JSON

bq show mydataset.table1

Last modified          Schema         Total Rows   Total Bytes
----------------- -------------------- ------------ -------------
 22 Dec 22:10:32   |- id: integer       3            63
                   |- json_data: json

For more information about loading CSV files, see Loading CSV data from Cloud Storage.

Use the Storage Write API

You can use the Storage Write API to ingest JSON data. The following example uses the Storage Write API Python client.

Define a protocol buffer to hold the serialized streaming data. The JSON data is encoded as a string. In the following example, the json_col field holds JSON data.

message SampleData {
  optional string string_col = 1;
  optional int64 int64_col = 2;
  optional string json_col = 3;
}

Format the JSON data for each row as a string value:

row.json_col = '{"a": 10, "b": "bar"}'
row.json_col = '"This is a string"' # The double-quoted string is the JSON value.
row.json_col = '10'

Append the rows to the write stream as shown in the code example. The client library handles serialization to protocol buffer format.

Use the legacy streaming API

The following example loads JSON data from a local file and streams it to BigQuery by using the legacy streaming API.

from google.cloud import bigquery
import json

# Read events data from a local file called local_file.json
with open('local_file.json', 'r') as json_file:
    data = json.load(json_file)

# TODO(developer): Replace these variables before running the sample.
project_id = 'MY_PROJECT_ID'
table_id = 'MY_TABLE_ID'

client = bigquery.Client(project=project_id)
table_obj = client.get_table(table_id)

# Throw errors if encountered.
# https://googleapis.dev/python/bigquery/latest/generated/google.cloud.bigquery.client.Client.html?highlight=insertall

errors = client.insert_rows(table=table_obj, rows=data)
if errors == []:
    print("New rows have been added.")
else:
    print("Encountered errors while inserting rows: {}".format(errors))

For more information, see Streaming data into BigQuery.

Query JSON data

This section describes how to use Standard SQL to extract values from the JSON. JSON is case-sensitive and supports UTF-8 in both fields and values.

The examples in this section use the following table:

CREATE OR REPLACE TABLE mydataset.table1(id INT64, cart JSON);

INSERT INTO mydataset.table1 VALUES
(1, JSON """{
        "name": "Alice",
        "items": [
            {"product": "book", "price": 10},
            {"product": "food", "price": 5}
        ]
    }"""),
(2, JSON """{
        "name": "Bob",
        "items": [
            {"product": "pen", "price": 20}
        ]
    }""");

Extract values as JSON

Given a JSON type in BigQuery, you can access the fields in a JSON expression by using the field access operator. The following example returns the name field of the cart column.

SELECT cart.name FROM mydataset.table1

+---------+
|  name   |
+---------+
| "Alice" |
| "Bob"   |
+---------+

To access an array element, use the JSON subscript operator. The following example returns the first element of the items array:

SELECT cart.items[0] AS first_item FROM mydataset.table1

+-------------------------------+
|          first_item           |
+-------------------------------+
| {"price":10,"product":"book"} |
| {"price":20,"product":"pen"}  |
+-------------------------------+

You can also use the JSON subscript operator to reference the members of a JSON object by name:

SELECT cart['name'] FROM mydataset.table1

+---------+
|  name   |
+---------+
| "Alice" |
| "Bob"   |
+---------+

For subscript operations, the expression inside the brackets can be any arbitrary string or integer expression, including non-constant expressions:

DECLARE int_val INT64 DEFAULT 0;

SELECT cart[CONCAT('it','ems')][int_val + 1].product AS item
FROM mydataset.table1;

+--------+
|  item  |
+--------+
| "food" |
| NULL   |
+--------+

Field access and subscript operators both return JSON types, so you can chain expressions that use them or pass the result to other functions that take JSON types.

These operators are syntactic sugar for the JSON_QUERY function. For example, the expression cart.name is equivalent to JSON_QUERY(cart, "$.name").

If a member with the specified name is not found in the JSON object, or if the JSON array doesn't have an element with the specified position, then these operators return SQL NULL.

SELECT cart.address AS address, cart.items[1].price AS item1_price
FROM mydataset.table1;

+---------+-------------+
| address | item1_price |
+---------+-------------+
| NULL    | NULL        |
| NULL    | 5           |
+---------+-------------+

The equality and comparison operators are not defined on the JSON data type. Therefore, you can't use JSON values directly in clauses like GROUP BY or ORDER BY. Instead, use the JSON_VALUE function to extract field values as SQL strings, as described in the next section.

Extract values as strings

The JSON_VALUE function extracts a scalar value and returns it as a SQL string. It returns SQL NULL if cart.name doesn't point to a scalar value in the JSON.

SELECT JSON_VALUE(cart.name) AS name
FROM mydataset.table1;

+-------+
| name  |
+-------+
| Alice |
+-------+

You can use the JSON_VALUE function in contexts that require equality or comparison, such as WHERE clauses and GROUP BY clauses. The following example shows a WHERE clause that filters against a JSON value:

SELECT cart.items[0] AS first_item
FROM mydataset.table1
WHERE JSON_VALUE(cart.name) = "Alice";

+-------------------------------+
| first_item                    |
+-------------------------------+
| {"price":10,"product":"book"} |
+-------------------------------+

In some cases, you might have to cast the string to another SQL data type. For example:

SELECT cart.name
FROM mydataset.table1
WHERE CAST(JSON_VALUE(cart.items[0].price) AS INT64) > 15;

+-------+
| name  |
+-------+
| "Bob" |
+-------+

Extract arrays from JSON

JSON can contain JSON arrays, which are not directly equivalent to an ARRAY<JSON> type in BigQuery. You can use the following functions to extract a BigQuery ARRAY from JSON:

  • JSON_QUERY_ARRAY: extracts an array and returns it as an ARRAY<JSON> of JSON.
  • JSON_VALUE_ARRAY: extracts an array of scalar values and returns it as an ARRAY<STRING> of scalar values.

The following example uses JSON_QUERY_ARRAY to extract JSON arrays.

SELECT JSON_QUERY_ARRAY(cart.items) AS items
FROM mydataset.table1;

+----------------------------------------------------------------+
| items                                                          |
+----------------------------------------------------------------+
| [{"price":10,"product":"book"}","{"price":5,"product":"food"}] |
| [{"price":20,"product":"pen"}]                                 |
+----------------------------------------------------------------+

To split an array into its individual elements, use the UNNEST operator, which returns a table with one row for each element in the array. The following example selects the product member from each member of the items array:

SELECT id, JSON_VALUE(item.product) AS product
FROM mydataset.table1, UNNEST(JSON_QUERY_ARRAY(cart.items)) AS item
ORDER BY id;

+----+---------+
| id | product |
+----+---------+
|  1 | book    |
|  1 | food    |
|  2 | pen     |
+----+---------+

The next example is similar but uses the ARRAY_AGG function to aggregate the values back into a SQL array.

SELECT id, ARRAY_AGG(JSON_VALUE(item.product)) AS products
FROM mydataset.table1, UNNEST(JSON_QUERY_ARRAY(cart.items)) AS item
GROUP BY id
ORDER BY id;

+----+-----------------+
| id | products        |
+----+-----------------+
|  1 | ["book","food"] |
|  2 | ["pen"]         |
+----+-----------------+

For more information about arrays, see Working with arrays in Standard SQL.

JSON nulls

The JSON type has a special null value that is different from the SQL NULL. A JSON null is not treated as a SQL NULL value, as the following example shows.

SELECT JSON 'null' IS NULL;

+-------+
| f0_   |
+-------+
| false |
+-------+

When you extract a JSON field with a null value, the behavior depends on the function:

  • The JSON_QUERY function returns a JSON null, because it is a valid JSON value.
  • The JSON_VALUE function returns the SQL NULL, because JSON null is not a scalar value.

The following example shows the different behaviors:

SELECT
  json.a AS json_query, -- Equivalent to JSON_QUERY(json, '$.a')
  JSON_VALUE(json, '$.a') AS json_value
FROM (SELECT JSON '{"a": null}' AS json);

+------------+------------+
| json_query | json_value |
+------------+------------+
| null       | NULL       |
+------------+------------+

Troubleshooting

If you run into issues during the Preview phase, follow these suggestions.

I added a JSON column to my table and my workflow doesn't work anymore.

The JSON data type uses a different backend storage format than other SQL data types. If you encounter issues, you might need to revert your table to its previous state. To do so, perform the following steps:

  1. Use the ALTER TABLE DROP COLUMN statement to delete the JSON column.

    Example:

    ALTER TABLE mydataset.table1 DROP COLUMN cart;
    
  2. Rewrite the data into a new table.

    The previous step doesn't rewrite the underlying data, so issues might persist. To completely clean the table of any JSON data, rewrite the table.

    • Option 1: Use the CREATE TABLE statement. Example:

      CREATE OR REPLACE TABLE mydataset.table1
      AS SELECT * FROM mydataset.table1
      
    • Option 2: Run a SELECT * query on the table and save the query results as a new table.

You can combine both of these steps by using a CREATE TABLE ... AS SELECT query and excluding the existing JSON column(s). For example:

CREATE OR REPLACE TABLE mydataset.table1
AS SELECT * EXCEPT (cart) FROM mydataset.table1

These steps should revert the table to its pre-JSON state and resolve any issues with the table. There is no need to opt out of the JSON preview.

If you have any questions or concerns, please contact bq-json-team@google.com.

I want to opt out of the JSON preview

Make sure to drop all JSON columns from your tables or delete tables with JSON data. Then send an email to bq-json-team@google.com, using the same email used to enroll in the JSON preview. Include the following information in the email:

  • Project number(s)
  • Fully qualified names of any tables that contained JSON data. This information is needed to rewrite the underlying files.