Getting streaming metadata using INFORMATION_SCHEMA

INFORMATION_SCHEMA is a series of views that provide access to metadata about datasets, routines, tables, views, jobs, reservations, and streaming data.

You can query the INFORMATION_SCHEMA streaming views to retrieve historical and real-time information about streaming data into BigQuery. These views contain per minute aggregated statistics for each table that have data streamed into them. For more information about streaming data into BigQuery, see Streaming data into BigQuery.

Required permissions

Retrieving streaming metadata by using INFORMATION_SCHEMA tables requires appropriately scoped permissions:

  • STREAMING_TIMELINE_BY_PROJECT requires bigquery.tables.list for the project and is available to the BigQuery User, BigQuery Data Viewer, BigQuery Data Editor, BigQuery Data Owner, BigQuery Metadata Viewer, BigQuery Resource Admin, and BigQuery Admin roles.
  • STREAMING_TIMELINE_BY_ORGANIZATION requires bigquery.tables.list for the organization and is available to the BigQuery User, BigQuery Data Viewer, BigQuery Data Editor, BigQuery Data Owner, BigQuery Metadata Viewer, BigQuery Resource Admin, and BigQuery Admin roles.

Schemas

When you query the INFORMATION_SCHEMA streaming views, the query results contain historical and real-time information about streaming data into BigQuery. Each row in the following views represents statistics for streaming into a specific table, aggregated over a one minute interval starting at start_timestamp. Statistics are grouped by error code, so there will be one row for each error code encountered during the one minute interval for each timestamp and table combination. Successful requests have the error code set to NULL. If no data was streamed into a table during a certain time period, then no rows are present for the corresponding timestamps for that table.

  • INFORMATION_SCHEMA.STREAMING_TIMELINE_BY_PROJECT contains per-minute aggregated streaming statistics for the current project.
  • INFORMATION_SCHEMA.STREAMING_TIMELINE_BY_ORGANIZATION contains per-minute aggregated streaming statistics for the whole organization associated with the current project.

The INFORMATION_SCHEMA.STREAMING_TIMELINE_BY_PROJECT and INFORMATION_SCHEMA.STREAMING_TIMELINE_BY_ORGANIZATION views have the following schema:

Column name Data type Value
start_timestamp TIMESTAMP (Partitioning column) Start timestamp of the 1 minute interval for the aggregated statistics.
project_id STRING (Clustering column) ID of the project.
project_number INTEGER Number of the project.
dataset_id STRING (Clustering column) ID of the dataset.
table_id STRING (Clustering column) ID of the table.
error_code STRING Error code returned for the requests specified by this row. NULL for successful requests.
total_requests INTEGER Total number of requests within the 1 minute interval.
total_rows INTEGER Total number of rows from all requests within the 1 minute interval.
total_input_bytes INTEGER Total number of bytes from all rows within the 1 minute interval.

Data retention

Currently, only the last 180 days of streaming history is retained in the INFORMATION_SCHEMA streaming views.

Regionality

BigQuery's INFORMATION_SCHEMA streaming views are regionalized. To query these views, you must prefix a supported region name in the format `region-region-name`.INFORMATION_SCHEMA.view.

For example:

  • To query data in the US multi-region, use `region-us`.INFORMATION_SCHEMA.STREAMING_TIMELINE_BY_PROJECT
  • To query data in the EU multi-region, use `region-eu`.INFORMATION_SCHEMA.STREAMING_TIMELINE_BY_PROJECT
  • To query data in the asia-northeast1 region, use `region-asia-northeast1`.INFORMATION_SCHEMA.STREAMING_TIMELINE_BY_PROJECT

For a list of available regions, see Dataset locations.

Examples

Example 1: Recent streaming failures

The following example calculates the per minute breakdown of total failed requests for all tables in the project in the last 30 minutes, split by error code.

To run the query against a project other than your default project, add the project ID in the following format: `project_id`.`region-region_name`.INFORMATION_SCHEMA.view; for example, `myproject`.`region-us`.INFORMATION_SCHEMA.STREAMING_TIMELINE_BY_PROJECT.

To run the query:

Console

  1. In the Cloud Console, open the BigQuery page.

    Go to the BigQuery page

  2. In the Query editor box, enter the following standard SQL query. INFORMATION_SCHEMA requires standard SQL syntax. Standard SQL is the default syntax in the Cloud Console.

    SELECT
     start_timestamp,
     error_code,
     SUM(total_requests) AS num_failed_requests
    FROM
     `region-us`.INFORMATION_SCHEMA.STREAMING_TIMELINE_BY_PROJECT
    WHERE
     error_code IS NOT NULL
     AND start_timestamp > TIMESTAMP_SUB(CURRENT_TIMESTAMP, INTERVAL 30 MINUTE)
    GROUP BY
     start_timestamp,
     error_code
    ORDER BY
     1 DESC
    
    
  3. Click Run.

bq

Use the query command and specify standard SQL syntax by using the --nouse_legacy_sql or --use_legacy_sql=false flag. Standard SQL syntax is required for INFORMATION_SCHEMA queries.

To run the query, enter:

bq query --nouse_legacy_sql \
'SELECT
     start_timestamp,
     error_code,
     SUM(total_requests) AS num_failed_requests
   FROM
     `region-us`.INFORMATION_SCHEMA.STREAMING_TIMELINE_BY_PROJECT
   WHERE
     error_code IS NOT NULL
     AND start_timestamp > TIMESTAMP_SUB(CURRENT_TIMESTAMP, INTERVAL 30 MINUTE)
   GROUP BY
     start_timestamp,
     error_code
   ORDER BY
     1 DESC'

The results should look like the following:

  +---------------------+------------------+---------------------+
  |   start_timestamp   |    error_code    | num_failed_requests |
  +---------------------+------------------+---------------------+
  | 2020-04-15 20:55:00 | INTERNAL_ERROR   |                  41 |
  | 2020-04-15 20:41:00 | CONNECTION_ERROR |                   5 |
  | 2020-04-15 20:30:00 | INTERNAL_ERROR   |                 115 |
  +---------------------+------------------+---------------------+
  

Example 2: Per minute breakdown for all requests with error codes

The following example calculates a per minute breakdown of successful and failed streaming requests, split into error code categories. This query could be used to populate a dashboard.

To run the query against a project other than your default project, add the project ID in the following format: `project_id`.`region-region_name`.INFORMATION_SCHEMA.view; for example, `myproject`.`region-us`.INFORMATION_SCHEMA.STREAMING_TIMELINE_BY_PROJECT.

To run the query:

Console

  1. In the Cloud Console, open the BigQuery page.

    Go to the BigQuery page

  2. Enter the following standard SQL query in the Query editor box. INFORMATION_SCHEMA requires standard SQL syntax. Standard SQL is the default syntax in the Cloud Console.

    SELECT
     start_timestamp,
     SUM(total_requests) AS total_requests,
     SUM(total_rows) AS total_rows,
     SUM(total_input_bytes) AS total_input_bytes,
     SUM(IF(error_code IN ('QUOTA_EXCEEDED', 'RATE_LIMIT_EXCEEDED'),
         total_requests, 0)) AS quota_error,
     SUM(IF(error_code IN ('INVALID_VALUE', 'NOT_FOUND', 'SCHEMA_INCOMPATIBLE',
                           'BILLING_NOT_ENABLED', 'ACCESS_DENIED', 'UNAUTHENTICATED'),
         total_requests, 0)) AS user_error,
     SUM(IF(error_code IN ('CONNECTION_ERROR','INTERNAL_ERROR'),
         total_requests, 0)) AS server_error,
     SUM(IF(error_code IS NULL, 0, total_requests)) AS total_error,
    FROM
     `region-us`.INFORMATION_SCHEMA.STREAMING_TIMELINE_BY_PROJECT
    GROUP BY
     start_timestamp
    ORDER BY
     1 DESC
    
  3. Click Run.

bq

Use the query command and specify standard SQL syntax by using the --nouse_legacy_sql or --use_legacy_sql=false flag. Standard SQL syntax is required for INFORMATION_SCHEMA queries.

To run the query, enter:

bq query --nouse_legacy_sql \
'SELECT
     start_timestamp,
     SUM(total_requests) AS total_requests,
     SUM(total_rows) AS total_rows,
     SUM(total_input_bytes) AS total_input_bytes,
     SUM(IF(error_code IN ("QUOTA_EXCEEDED", "RATE_LIMIT_EXCEEDED"),
         total_requests, 0)) AS quota_error,
     SUM(IF(error_code IN ("INVALID_VALUE", "NOT_FOUND", "SCHEMA_INCOMPATIBLE",
                           "BILLING_NOT_ENABLED", "ACCESS_DENIED", "UNAUTHENTICATED"),
         total_requests, 0)) AS user_error,
     SUM(IF(error_code IN ("CONNECTION_ERROR", "INTERNAL_ERROR"),
         total_requests, 0)) AS server_error,
     SUM(IF(error_code IS NULL, 0, total_requests)) AS total_error,
   FROM
     `region-us`.INFORMATION_SCHEMA.STREAMING_TIMELINE_BY_PROJECT
   GROUP BY
     start_timestamp
   ORDER BY
     1 DESC'

The results should look like the following:

+---------------------+----------------+------------+-------------------+-------------+------------+--------------+-------------+
|   start_timestamp   | total_requests | total_rows | total_input_bytes | quota_error | user_error | server_error | total_error |
+---------------------+----------------+------------+-------------------+-------------+------------+--------------+-------------+
| 2020-04-15 22:00:00 |         441854 |     441854 |       23784853118 |           0 |          0 |           17 |          17 |
| 2020-04-15 21:59:00 |         355627 |     355627 |       26101982742 |           0 |          0 |            0 |           0 |
| 2020-04-15 21:58:00 |         354603 |     354603 |       26160565341 |           0 |          0 |            0 |           0 |
| 2020-04-15 21:57:00 |         298823 |     298823 |       23877821442 |           0 |          0 |            0 |           0 |
+---------------------+----------------+------------+-------------------+-------------+------------+--------------+-------------+
  

Example 3: Tables with the most incoming traffic

The following example returns the streaming statistics for the 10 tables with the most incoming traffic.

To run the query against a project other than your default project, add the project ID in the following format: `project_id`.`region-region_name`.INFORMATION_SCHEMA.view; for example, `myproject`.`region-us`.INFORMATION_SCHEMA.STREAMING_TIMELINE_BY_PROJECT.

To run the query:

Console

  1. In the Cloud Console, open the BigQuery page.

    Go to the BigQuery page

  2. In the Query editor box, enter the following standard SQL query. INFORMATION_SCHEMA requires standard SQL syntax. Standard SQL is the default syntax in the Cloud Console.

    SELECT
     project_id,
     dataset_id,
     table_id,
     SUM(total_rows) AS num_rows,
     SUM(total_input_bytes) AS num_bytes,
     SUM(total_requests) AS num_requests
    FROM
     `region-us`.INFORMATION_SCHEMA.STREAMING_TIMELINE_BY_PROJECT
    GROUP BY 1, 2, 3
    ORDER BY num_bytes DESC
    LIMIT 10
    
  3. Click Run.

bq

Use the query command and specify standard SQL syntax by using the --nouse_legacy_sql or --use_legacy_sql=false flag. Standard SQL syntax is required for INFORMATION_SCHEMA queries.

To run the query, enter:

bq query --nouse_legacy_sql \
'SELECT
   project_id,
   dataset_id,
   table_id,
   SUM(total_rows) AS num_rows,
   SUM(total_input_bytes) AS num_bytes,
   SUM(total_requests) AS num_requests
 FROM
   `region-us`.INFORMATION_SCHEMA.STREAMING_TIMELINE_BY_PROJECT
 GROUP BY 1, 2, 3
 ORDER BY num_bytes DESC
 LIMIT 10'

The results should look like the following:

  +----------------------+------------+-------------------------------+------------+----------------+--------------+
  |      project_id      | dataset_id |           table_id            |  num_rows  |   num_bytes    | num_requests |
  +----------------------+------------+-------------------------------+------------+----------------+--------------+
  | my-project           | dataset1   | table1                        | 8016725532 | 73787301876979 |   8016725532 |
  | my-project           | dataset1   | table2                        |   26319580 | 34199853725409 |     26319580 |
  | my-project           | dataset2   | table1                        |   38355294 | 22879180658120 |     38355294 |
  | my-project           | dataset1   | table3                        |  270126906 | 17594235226765 |    270126906 |
  | my-project           | dataset2   | table2                        |   95511309 | 17376036299631 |     95511309 |
  | my-project           | dataset2   | table3                        |   46500443 | 12834920497777 |     46500443 |
  | my-project           | dataset2   | table4                        |   25846270 |  7487917957360 |     25846270 |
  | my-project           | dataset1   | table4                        |   18318404 |  5665113765882 |     18318404 |
  | my-project           | dataset1   | table5                        |   42829431 |  5343969665771 |     42829431 |
  | my-project           | dataset1   | table6                        |    8771021 |  5119004622353 |      8771021 |
  +----------------------+------------+-------------------------------+------------+----------------+--------------+
  

Example 4: Streaming error ratio for a table

The following example calculates a per-day breakdown of errors for a specific table, split by error code.

To run the query against a project other than your default project, add the project ID in the following format: `project_id`.`region-region_name`.INFORMATION_SCHEMA.view; for example, `myproject`.`region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT.

To run the query:

Console

  1. In the Cloud Console, open the BigQuery page.

    Go to the BigQuery page

  2. In the Query editor box, enter the following standard SQL query. INFORMATION_SCHEMA requires standard SQL syntax. Standard SQL is the default syntax in the Cloud Console.

    SELECT
     TIMESTAMP_TRUNC(start_timestamp, DAY) as day,
     project_id,
     dataset_id,
     table_id,
     error_code,
     SUM(total_rows) AS num_rows,
     SUM(total_input_bytes) AS num_bytes,
     SUM(total_requests) AS num_requests
    FROM
     `region-us`.INFORMATION_SCHEMA.STREAMING_TIMELINE_BY_PROJECT
    WHERE table_id LIKE "my_table"
    GROUP BY project_id, dataset_id, table_id, error_code, day
    ORDER BY day, project_id, dataset_id DESC
    
  3. Click Run.

bq

Use the query command and specify standard SQL syntax by using the --nouse_legacy_sql or --use_legacy_sql=false flag. Standard SQL syntax is required for INFORMATION_SCHEMA queries.

To run the query, enter:

bq query --nouse_legacy_sql \
'SELECT
   TIMESTAMP_TRUNC(start_timestamp, DAY) as day,
   project_id,
   dataset_id,
   table_id,
   error_code,
   SUM(total_rows) AS num_rows,
   SUM(total_input_bytes) AS num_bytes,
   SUM(total_requests) AS num_requests
 FROM
   `region-us`.INFORMATION_SCHEMA.STREAMING_TIMELINE_BY_PROJECT
 WHERE table_id LIKE "my_table"
 GROUP BY project_id, dataset_id, table_id, error_code, day
 ORDER BY day, project_id, dataset_id DESC'

The results should look like the following:

+---------------------+-------------+------------+----------+----------------+----------+-----------+--------------+
|         day         |  project_id | dataset_id | table_id |   error_code   | num_rows | num_bytes | num_requests |
+---------------------+-------------+------------+----------+----------------+----------+-----------+--------------+
| 2020-04-21 00:00:00 | my_project  | my_dataset | my_table | NULL           |       41 |    252893 |           41 |
| 2020-04-20 00:00:00 | my_project  | my_dataset | my_table | NULL           |     2798 |  10688286 |         2798 |
| 2020-04-19 00:00:00 | my_project  | my_dataset | my_table | NULL           |     2005 |   7979495 |         2005 |
| 2020-04-18 00:00:00 | my_project  | my_dataset | my_table | NULL           |     2054 |   7972378 |         2054 |
| 2020-04-17 00:00:00 | my_project  | my_dataset | my_table | NULL           |     2056 |   6978079 |         2056 |
| 2020-04-17 00:00:00 | my_project  | my_dataset | my_table | INTERNAL_ERROR |        4 |     10825 |            4 |
+---------------------+-------------+------------+----------+----------------+----------+-----------+--------------+
  

Next steps