INFORMATION_SCHEMA
is a series of views that provide access to metadata
about datasets, routines, tables, views, jobs, reservations, and streaming data.
You can query the INFORMATION_SCHEMA
streaming views to retrieve historical
and real-time information about streaming data into BigQuery.
These views contain per minute aggregated statistics for each table that have
data streamed into them. For more information about streaming data into
BigQuery, see Streaming data into BigQuery.
Required permissions
Retrieving streaming metadata by using INFORMATION_SCHEMA
tables requires
appropriately scoped permissions:
STREAMING_TIMELINE_BY_PROJECT
requiresbigquery.tables.list
for the project and is available to theBigQuery User
,BigQuery Data Viewer
,BigQuery Data Editor
,BigQuery Data Owner
,BigQuery Metadata Viewer
,BigQuery Resource Admin
, andBigQuery Admin
roles.STREAMING_TIMELINE_BY_FOLDER
requiresbigquery.tables.list
for the parent folder of the current project and is available to theBigQuery User
,BigQuery Data Viewer
,BigQuery Data Editor
,BigQuery Data Owner
,BigQuery Metadata Viewer
,BigQuery Resource Admin
, andBigQuery Admin
roles.STREAMING_TIMELINE_BY_ORGANIZATION
requiresbigquery.tables.list
for the organization and is available to theBigQuery User
,BigQuery Data Viewer
,BigQuery Data Editor
,BigQuery Data Owner
,BigQuery Metadata Viewer
,BigQuery Resource Admin
, andBigQuery Admin
roles.
Schemas
When you query the INFORMATION_SCHEMA
streaming views, the query results
contain historical and real-time information about streaming data into
BigQuery. Each row in the following views represents statistics
for streaming into a specific table, aggregated over a one minute interval
starting at start_timestamp
. Statistics are grouped by error code, so there
will be one row for each error code encountered during the one minute interval
for each timestamp and table combination. Successful requests have the error
code set to NULL. If no data was streamed into a table during a certain time
period, then no rows are present for the corresponding timestamps for that
table.
INFORMATION_SCHEMA.STREAMING_TIMELINE_BY_PROJECT
contains per-minute aggregated streaming statistics for the current project.INFORMATION_SCHEMA.STREAMING_TIMELINE_BY_FOLDER
contains per-minute aggregated streaming statistics for the parent folder of the current project, including its subfolders.INFORMATION_SCHEMA.STREAMING_TIMELINE_BY_ORGANIZATION
contains per-minute aggregated streaming statistics for the whole organization associated with the current project.
The INFORMATION_SCHEMA.STREAMING_TIMELINE_BY_PROJECT
,
INFORMATION_SCHEMA.STREAMING_TIMELINE_BY_FODLER
, and
INFORMATION_SCHEMA.STREAMING_TIMELINE_BY_ORGANIZATION
views have the
following schema:
Column name | Data type | Value |
---|---|---|
start_timestamp |
TIMESTAMP |
(Partitioning column) Start timestamp of the 1 minute interval for the aggregated statistics. |
project_id |
STRING |
(Clustering column) ID of the project. |
project_number |
INTEGER |
Number of the project. |
dataset_id |
STRING |
(Clustering column) ID of the dataset. |
table_id |
STRING |
(Clustering column) ID of the table. |
error_code |
STRING |
Error code returned for the requests specified by this row. NULL for successful requests. |
total_requests |
INTEGER |
Total number of requests within the 1 minute interval. |
total_rows |
INTEGER |
Total number of rows from all requests within the 1 minute interval. |
total_input_bytes |
INTEGER |
Total number of bytes from all rows within the 1 minute interval. |
Data retention
Currently, only the last 180 days of streaming history is retained in the
INFORMATION_SCHEMA
streaming views.
Regionality
BigQuery's INFORMATION_SCHEMA
streaming views are
regionalized. To query these views, you must prefix a supported region
name in the format
`region-region-name`.INFORMATION_SCHEMA.view
.
For example:
- To query data in the US multi-region, use
`region-us`.INFORMATION_SCHEMA.STREAMING_TIMELINE_BY_PROJECT
- To query data in the EU multi-region, use
`region-eu`.INFORMATION_SCHEMA.STREAMING_TIMELINE_BY_PROJECT
- To query data in the asia-northeast1 region, use
`region-asia-northeast1`.INFORMATION_SCHEMA.STREAMING_TIMELINE_BY_PROJECT
For a list of available regions, see Dataset locations.
Examples
Example 1: Recent streaming failures
The following example calculates the per minute breakdown of total failed requests for all tables in the project in the last 30 minutes, split by error code.
To run the query against a project other than your default project, add the
project ID in the following format:
`project_id`.`region-region_name`.INFORMATION_SCHEMA.view
;
for example, `myproject`.`region-us`.INFORMATION_SCHEMA.STREAMING_TIMELINE_BY_PROJECT
.
To run the query:
Console
In the Cloud Console, open the BigQuery page.
In the Query editor box, enter the following standard SQL query.
INFORMATION_SCHEMA
requires standard SQL syntax. Standard SQL is the default syntax in the Cloud Console.SELECT start_timestamp, error_code, SUM(total_requests) AS num_failed_requests FROM `region-us`.INFORMATION_SCHEMA.STREAMING_TIMELINE_BY_PROJECT WHERE error_code IS NOT NULL AND start_timestamp > TIMESTAMP_SUB(CURRENT_TIMESTAMP, INTERVAL 30 MINUTE) GROUP BY start_timestamp, error_code ORDER BY 1 DESC
Click Run.
bq
Use the query
command and specify standard SQL syntax by using the
--nouse_legacy_sql
or --use_legacy_sql=false
flag. Standard SQL syntax
is required for INFORMATION_SCHEMA
queries.
To run the query, enter:
bq query --nouse_legacy_sql \ 'SELECT start_timestamp, error_code, SUM(total_requests) AS num_failed_requests FROM `region-us`.INFORMATION_SCHEMA.STREAMING_TIMELINE_BY_PROJECT WHERE error_code IS NOT NULL AND start_timestamp > TIMESTAMP_SUB(CURRENT_TIMESTAMP, INTERVAL 30 MINUTE) GROUP BY start_timestamp, error_code ORDER BY 1 DESC'
The results should look like the following:
+---------------------+------------------+---------------------+ | start_timestamp | error_code | num_failed_requests | +---------------------+------------------+---------------------+ | 2020-04-15 20:55:00 | INTERNAL_ERROR | 41 | | 2020-04-15 20:41:00 | CONNECTION_ERROR | 5 | | 2020-04-15 20:30:00 | INTERNAL_ERROR | 115 | +---------------------+------------------+---------------------+
Example 2: Per minute breakdown for all requests with error codes
The following example calculates a per minute breakdown of successful and failed streaming requests, split into error code categories. This query could be used to populate a dashboard.
To run the query against a project other than your default project, add the
project ID in the following format:
`project_id`.`region-region_name`.INFORMATION_SCHEMA.view
;
for example, `myproject`.`region-us`.INFORMATION_SCHEMA.STREAMING_TIMELINE_BY_PROJECT
.
To run the query:
Console
In the Cloud Console, open the BigQuery page.
Enter the following standard SQL query in the Query editor box.
INFORMATION_SCHEMA
requires standard SQL syntax. Standard SQL is the default syntax in the Cloud Console.SELECT start_timestamp, SUM(total_requests) AS total_requests, SUM(total_rows) AS total_rows, SUM(total_input_bytes) AS total_input_bytes, SUM(IF(error_code IN ('QUOTA_EXCEEDED', 'RATE_LIMIT_EXCEEDED'), total_requests, 0)) AS quota_error, SUM(IF(error_code IN ('INVALID_VALUE', 'NOT_FOUND', 'SCHEMA_INCOMPATIBLE', 'BILLING_NOT_ENABLED', 'ACCESS_DENIED', 'UNAUTHENTICATED'), total_requests, 0)) AS user_error, SUM(IF(error_code IN ('CONNECTION_ERROR','INTERNAL_ERROR'), total_requests, 0)) AS server_error, SUM(IF(error_code IS NULL, 0, total_requests)) AS total_error, FROM `region-us`.INFORMATION_SCHEMA.STREAMING_TIMELINE_BY_PROJECT GROUP BY start_timestamp ORDER BY 1 DESC
Click Run.
bq
Use the query
command and specify standard SQL syntax by using the
--nouse_legacy_sql
or --use_legacy_sql=false
flag. Standard SQL syntax
is required for INFORMATION_SCHEMA
queries.
To run the query, enter:
bq query --nouse_legacy_sql \ 'SELECT start_timestamp, SUM(total_requests) AS total_requests, SUM(total_rows) AS total_rows, SUM(total_input_bytes) AS total_input_bytes, SUM(IF(error_code IN ("QUOTA_EXCEEDED", "RATE_LIMIT_EXCEEDED"), total_requests, 0)) AS quota_error, SUM(IF(error_code IN ("INVALID_VALUE", "NOT_FOUND", "SCHEMA_INCOMPATIBLE", "BILLING_NOT_ENABLED", "ACCESS_DENIED", "UNAUTHENTICATED"), total_requests, 0)) AS user_error, SUM(IF(error_code IN ("CONNECTION_ERROR", "INTERNAL_ERROR"), total_requests, 0)) AS server_error, SUM(IF(error_code IS NULL, 0, total_requests)) AS total_error, FROM `region-us`.INFORMATION_SCHEMA.STREAMING_TIMELINE_BY_PROJECT GROUP BY start_timestamp ORDER BY 1 DESC'
The results should look like the following:
+---------------------+----------------+------------+-------------------+-------------+------------+--------------+-------------+ | start_timestamp | total_requests | total_rows | total_input_bytes | quota_error | user_error | server_error | total_error | +---------------------+----------------+------------+-------------------+-------------+------------+--------------+-------------+ | 2020-04-15 22:00:00 | 441854 | 441854 | 23784853118 | 0 | 0 | 17 | 17 | | 2020-04-15 21:59:00 | 355627 | 355627 | 26101982742 | 0 | 0 | 0 | 0 | | 2020-04-15 21:58:00 | 354603 | 354603 | 26160565341 | 0 | 0 | 0 | 0 | | 2020-04-15 21:57:00 | 298823 | 298823 | 23877821442 | 0 | 0 | 0 | 0 | +---------------------+----------------+------------+-------------------+-------------+------------+--------------+-------------+
Example 3: Tables with the most incoming traffic
The following example returns the streaming statistics for the 10 tables with the most incoming traffic.
To run the query against a project other than your default project, add the
project ID in the following format:
`project_id`.`region-region_name`.INFORMATION_SCHEMA.view
;
for example, `myproject`.`region-us`.INFORMATION_SCHEMA.STREAMING_TIMELINE_BY_PROJECT
.
To run the query:
Console
In the Cloud Console, open the BigQuery page.
In the Query editor box, enter the following standard SQL query.
INFORMATION_SCHEMA
requires standard SQL syntax. Standard SQL is the default syntax in the Cloud Console.SELECT project_id, dataset_id, table_id, SUM(total_rows) AS num_rows, SUM(total_input_bytes) AS num_bytes, SUM(total_requests) AS num_requests FROM `region-us`.INFORMATION_SCHEMA.STREAMING_TIMELINE_BY_PROJECT GROUP BY 1, 2, 3 ORDER BY num_bytes DESC LIMIT 10
Click Run.
bq
Use the query
command and specify standard SQL syntax by using the
--nouse_legacy_sql
or --use_legacy_sql=false
flag. Standard SQL syntax
is required for INFORMATION_SCHEMA
queries.
To run the query, enter:
bq query --nouse_legacy_sql \ 'SELECT project_id, dataset_id, table_id, SUM(total_rows) AS num_rows, SUM(total_input_bytes) AS num_bytes, SUM(total_requests) AS num_requests FROM `region-us`.INFORMATION_SCHEMA.STREAMING_TIMELINE_BY_PROJECT GROUP BY 1, 2, 3 ORDER BY num_bytes DESC LIMIT 10'
The results should look like the following:
+----------------------+------------+-------------------------------+------------+----------------+--------------+ | project_id | dataset_id | table_id | num_rows | num_bytes | num_requests | +----------------------+------------+-------------------------------+------------+----------------+--------------+ | my-project | dataset1 | table1 | 8016725532 | 73787301876979 | 8016725532 | | my-project | dataset1 | table2 | 26319580 | 34199853725409 | 26319580 | | my-project | dataset2 | table1 | 38355294 | 22879180658120 | 38355294 | | my-project | dataset1 | table3 | 270126906 | 17594235226765 | 270126906 | | my-project | dataset2 | table2 | 95511309 | 17376036299631 | 95511309 | | my-project | dataset2 | table3 | 46500443 | 12834920497777 | 46500443 | | my-project | dataset2 | table4 | 25846270 | 7487917957360 | 25846270 | | my-project | dataset1 | table4 | 18318404 | 5665113765882 | 18318404 | | my-project | dataset1 | table5 | 42829431 | 5343969665771 | 42829431 | | my-project | dataset1 | table6 | 8771021 | 5119004622353 | 8771021 | +----------------------+------------+-------------------------------+------------+----------------+--------------+
Example 4: Streaming error ratio for a table
The following example calculates a per-day breakdown of errors for a specific table, split by error code.
To run the query against a project other than your default project, add the
project ID in the following format:
`project_id`.`region-region_name`.INFORMATION_SCHEMA.view
;
for example, `myproject`.`region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
.
To run the query:
Console
In the Cloud Console, open the BigQuery page.
In the Query editor box, enter the following standard SQL query.
INFORMATION_SCHEMA
requires standard SQL syntax. Standard SQL is the default syntax in the Cloud Console.SELECT TIMESTAMP_TRUNC(start_timestamp, DAY) as day, project_id, dataset_id, table_id, error_code, SUM(total_rows) AS num_rows, SUM(total_input_bytes) AS num_bytes, SUM(total_requests) AS num_requests FROM `region-us`.INFORMATION_SCHEMA.STREAMING_TIMELINE_BY_PROJECT WHERE table_id LIKE "my_table" GROUP BY project_id, dataset_id, table_id, error_code, day ORDER BY day, project_id, dataset_id DESC
Click Run.
bq
Use the query
command and specify standard SQL syntax by using the
--nouse_legacy_sql
or --use_legacy_sql=false
flag. Standard SQL syntax
is required for INFORMATION_SCHEMA
queries.
To run the query, enter:
bq query --nouse_legacy_sql \ 'SELECT TIMESTAMP_TRUNC(start_timestamp, DAY) as day, project_id, dataset_id, table_id, error_code, SUM(total_rows) AS num_rows, SUM(total_input_bytes) AS num_bytes, SUM(total_requests) AS num_requests FROM `region-us`.INFORMATION_SCHEMA.STREAMING_TIMELINE_BY_PROJECT WHERE table_id LIKE "my_table" GROUP BY project_id, dataset_id, table_id, error_code, day ORDER BY day, project_id, dataset_id DESC'
The results should look like the following:
+---------------------+-------------+------------+----------+----------------+----------+-----------+--------------+ | day | project_id | dataset_id | table_id | error_code | num_rows | num_bytes | num_requests | +---------------------+-------------+------------+----------+----------------+----------+-----------+--------------+ | 2020-04-21 00:00:00 | my_project | my_dataset | my_table | NULL | 41 | 252893 | 41 | | 2020-04-20 00:00:00 | my_project | my_dataset | my_table | NULL | 2798 | 10688286 | 2798 | | 2020-04-19 00:00:00 | my_project | my_dataset | my_table | NULL | 2005 | 7979495 | 2005 | | 2020-04-18 00:00:00 | my_project | my_dataset | my_table | NULL | 2054 | 7972378 | 2054 | | 2020-04-17 00:00:00 | my_project | my_dataset | my_table | NULL | 2056 | 6978079 | 2056 | | 2020-04-17 00:00:00 | my_project | my_dataset | my_table | INTERNAL_ERROR | 4 | 10825 | 4 | +---------------------+-------------+------------+----------+----------------+----------+-----------+--------------+
Next steps
- For an overview of
INFORMATION_SCHEMA
, see Introduction to BigQueryINFORMATION_SCHEMA
. - Learn how to use
INFORMATION_SCHEMA
to get job metadata. - Learn how to use
INFORMATION_SCHEMA
to get resevations metadata. - Learn how to use
INFORMATION_SCHEMA
to get dataset metadata. - Learn how to use
INFORMATION_SCHEMA
to get table metadata. - Learn how to use
INFORMATION_SCHEMA
to get view metadata.