INFORMATION_SCHEMA
is a series of views that provide access to metadata
about datasets, routines, tables, views, jobs, reservations, and streaming data.
You can query the INFORMATION_SCHEMA.JOBS_BY_*
view to retrieve real-time
metadata about BigQuery jobs. This view contains currently
running jobs, as well as the last 180 days of history of completed jobs.
Required permissions
Retrieving jobs metadata by using INFORMATION_SCHEMA
tables requires
appropriately-scoped permissions:
JOBS_BY_USER
requiresbigquery.jobs.list
for the project and is available to theProject Viewer
andBigQuery User
roles.JOBS_BY_PROJECT
requiresbigquery.jobs.listAll
for the project and is available to theProject Owner
andBigQuery Admin
roles.JOBS_BY_FOLDER
requiresbigquery.jobs.listAll
for the parent folder and is available to theFolder Admin
andBigQuery Admin
for the folder.JOBS_BY_ORGANIZATION
requiresbigquery.jobs.listAll
for the organization and is available to theOrganization bigquery.resourceAdmin
,Organization Owner
andOrganization Admin
roles. Note thatJOBS_BY_ORGANIZATION
is only available to users with defined Google Cloud organizations.
For more information about granular BigQuery permissions, see roles and permissions.
Schema
When you query the INFORMATION_SCHEMA.JOBS_BY_*
views, the query results
contain one row for each BigQuery job.
INFORMATION_SCHEMA.JOBS_BY_USER
returns only the jobs submitted by the current user in the current project.INFORMATION_SCHEMA.JOBS_BY_PROJECT
returns all jobs submitted in the current project.INFORMATION_SCHEMA.JOBS_BY_FOLDER
returns all jobs submitted in the parent folder of the current project, including the jobs in subfolders under it.INFORMATION_SCHEMA.JOBS_BY_ORGANIZATION
returns all jobs submitted in the organization that is associated with the current project.
The INFORMATION_SCHEMA.JOBS_BY_*
view has the following schema:
Column name | Data type | Value |
---|---|---|
creation_time |
TIMESTAMP |
(Partitioning column) Creation time of this job. Partitioning is based on the UTC time of this timestamp. |
project_id |
STRING |
(Clustering column) ID of the project. |
project_number |
INTEGER |
Number of the project. |
folder_numbers |
RECORD |
Google Accounts and ID Administration (GAIA) IDs of folders in a
project's ancestry, in order starting with the leaf folder closest to
the project. This column is only populated in
JOBS_BY_FOLDER . |
user_email |
STRING |
(Clustering column) Email address or service account of the user who ran the job. |
job_id |
STRING |
ID of the job. For example, bquxjob_1234 . |
job_type |
STRING |
The type of the job. Can be QUERY , LOAD ,
EXTRACT , COPY , or null . Job
type null indicates an internal job, such as script job
statement evaluation or materialized view refresh. |
statement_type |
STRING |
The type of query statement, if valid. For example,
SELECT , INSERT , UPDATE , or
DELETE . |
priority |
STRING |
The priority of this job. |
start_time |
TIMESTAMP |
Start time of this job. |
end_time |
TIMESTAMP |
End time of this job. |
query |
STRING |
SQL query text. Note: The JOBS_BY_ORGANIZATION view does
not have the query column. |
state |
STRING |
Running state of the job. Valid states include PENDING ,
RUNNING , and DONE . |
reservation_id |
STRING |
Name of the primary reservation assigned to this job, if
applicable. If your job ran in a project that is assigned to a
reservation, it would follow this format:
reservation-admin-project:reservation-location.reservation-name .
|
total_bytes_processed |
INTEGER |
Total bytes processed by the job. |
total_slot_ms |
INTEGER |
Slot-milliseconds for the job over its entire duration. |
error_result |
RECORD |
Details of error (if any) as an ErrorProto . |
cache_hit |
BOOLEAN |
Whether the query results of this job were from a cache. |
destination_table |
RECORD |
Destination table for results (if any). |
referenced_tables |
RECORD |
Array of tables referenced by the job. |
labels |
RECORD |
Array of labels applied to the job as key , value strings. |
timeline |
RECORD |
Query timeline of the job. Contains snapshots of query execution. |
job_stages |
RECORD |
Query stages of the job. |
total_bytes_billed |
INTEGER |
total bytes billed by the job. |
Data retention
Currently, only the last 180 days of job history is retained in
INFORMATION_SCHEMA
jobs views.
Regionality
BigQuery's INFORMATION_SCHEMA
jobs views are regionalized. To
query these views, you must use a
region qualifier.
Examples
Example 1: Average slot utilization
The following example calculates average slot utilization for all queries over the past 7 days for a given project. Note that this calculation is most accurate for projects that have consistent slot usage throughout the week. If your project does not have consistent slot usage, this number might be lower than expected.
To run the query against a project other than your default project, add the
project ID in the following format:
`project_id`.`region-region_name`.INFORMATION_SCHEMA.view
;
for example, `myproject`.`region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
.
To run the query:
Console
In the Cloud Console, open the BigQuery page.
In the Query editor box, enter the following standard SQL query.
INFORMATION_SCHEMA
requires standard SQL syntax. Standard SQL is the default syntax in the Cloud Console.SELECT SUM(total_slot_ms) / (1000*60*60*24*7) AS avg_slots FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT WHERE -- Filter by the partition column first to limit the amount of data scanned. Eight days -- allows for jobs created before the 7 day end_time filter. creation_time BETWEEN TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 8 DAY) AND CURRENT_TIMESTAMP() AND job_type = "QUERY" AND end_time BETWEEN TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY) AND CURRENT_TIMESTAMP()
Click Run.
bq
Use the query
command and specify standard SQL syntax by using the
--nouse_legacy_sql
or --use_legacy_sql=false
flag. Standard SQL syntax
is required for INFORMATION_SCHEMA
queries.
To run the query, enter:
bq query --nouse_legacy_sql \ 'SELECT SUM(total_slot_ms) / (1000*60*60*24*7) AS avg_slots FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT WHERE -- Filter by the partition column first to limit the amount of data scanned. Eight days -- allows for jobs created before the 7 day end_time filter. creation_time BETWEEN TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 8 DAY) AND CURRENT_TIMESTAMP() AND job_type = "QUERY" AND statement_type != "SCRIPT" AND end_time BETWEEN TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY) AND CURRENT_TIMESTAMP()'
The results should look like the following:
+------------+ | avg_slots | +------------+ | 3879.1534 | +------------+
You can check usage for a particular reservation with
WHERE reservation_id = "…"
. This can be helpful to determine percentage use
of a reservation over a period of time. For script jobs, the parent job also
reports the total slot usage from its children jobs. To avoid double counting,
use WHERE statement_type != "SCRIPT"
to exclude the parent job.
If instead you would like to check the average slot utilization for individual
jobs, use total_slot_ms / TIMESTAMP_DIFF(end_time,start_time, MILLISECOND)
Example 2: Load job history
The following example lists all users or service accounts that submitted a batch load job for a given project. Because no time boundary is specified, this query scans all available history (for example, the last 30 days).
To run the query against a project other than your default project, add the
project ID in the following format:
`project_id`.`region-region_name`.INFORMATION_SCHEMA.view
;
for example, `myproject`.`region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
.
To run the query:
Console
In the Cloud Console, open the BigQuery page.
Enter the following standard SQL query in the Query editor box.
INFORMATION_SCHEMA
requires standard SQL syntax. Standard SQL is the default syntax in the Cloud Console.SELECT DISTINCT(user_email) AS user FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT WHERE job_type = "LOAD"
Click Run.
bq
Use the query
command and specify standard SQL syntax by using the
--nouse_legacy_sql
or --use_legacy_sql=false
flag. Standard SQL syntax
is required for INFORMATION_SCHEMA
queries.
To run the query, enter:
bq query --nouse_legacy_sql \ 'SELECT DISTINCT(user_email) AS user FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT WHERE job_type = "LOAD"'
The results should look like the following:
+--------------+ | user | +--------------+ | abc@xyz.com | +--------------+ | def@xyz.com | +--------------+
Example 3: Most expensive jobs
The following example demonstrates how to find the five jobs that scanned the
most bytes in an organization for the current day. You can filter further on
statement_type
to query for additional information such as loads, exports,
and queries.
To run the query against a project other than your default project, add the
project ID in the following format:
`project_id`.`region-region_name`.INFORMATION_SCHEMA.view
;
for example, `myproject`.`region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
.
To run the query:
Console
In the Cloud Console, open the BigQuery page.
In the Query editor box, enter the following standard SQL query.
INFORMATION_SCHEMA
requires standard SQL syntax. Standard SQL is the default syntax in the Cloud Console.SELECT job_id, user_email, total_bytes_processed FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_ORGANIZATION WHERE EXTRACT(DATE FROM creation_time) = current_date() ORDER BY total_bytes_processed DESC LIMIT 5
Click Run.
bq
Use the query
command and specify standard SQL syntax by using the
--nouse_legacy_sql
or --use_legacy_sql=false
flag. Standard SQL syntax
is required for INFORMATION_SCHEMA
queries.
To run the query, enter:
bq query --nouse_legacy_sql \ 'SELECT job_id, user_email, total_bytes_processed FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_ORGANIZATION WHERE EXTRACT(DATE FROM creation_time) = current_date() ORDER BY total_bytes_processed DESC LIMIT 5'
The results should look like the following:
+--------------+--------------+---------------------------+ | job_id | user_email | total_bytes_processed | +--------------+--------------+---------------------------+ | bquxjob_1 | abc@xyz.com | 999999 | +--------------+--------------+---------------------------+ | bquxjob_2 | def@xyz.com | 888888 | +--------------+--------------+---------------------------+ | bquxjob_3 | ghi@xyz.com | 777777 | +--------------+--------------+---------------------------+ | bquxjob_4 | jkl@xyz.com | 666666 | +--------------+--------------+---------------------------+ | bquxjob_5 | mno@xyz.com | 555555 | +--------------+--------------+---------------------------+
Example 4: Pending and running jobs
The following example lists jobs that were started by the current user and are currently in the pending or running state.
To run the query against a project other than your default project, add the
project ID in the following format:
`project_id`.`region-region_name`.INFORMATION_SCHEMA.view
;
for example, `myproject`.`region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
.
To run the query:
Console
In the Cloud Console, open the BigQuery page.
In the Query editor box, enter the following standard SQL query.
INFORMATION_SCHEMA
requires standard SQL syntax. Standard SQL is the default syntax in the Cloud Console.SELECT job_id, creation_time, query FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_USER WHERE state != "DONE"
Click Run.
bq
Use the query
command and specify standard SQL syntax by using the
--nouse_legacy_sql
or --use_legacy_sql=false
flag. Standard SQL syntax
is required for INFORMATION_SCHEMA
queries.
To run the query, enter:
bq query --nouse_legacy_sql \ 'SELECT job_id, creation_time, query FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_USER WHERE state != "DONE"'
The results should look like the following:
+--------------+--------------+----------------------------------------------+ | job_id | creation_time | query | +--------------+--------------+----------------------------------------------+ | bquxjob_1 | 2019-10-10 00:00:00 UTC | SELECT ... FROM dataset.table1 | +--------------+--------------+----------------------------------------------+ | bquxjob_2 | 2019-10-10 00:00:01 UTC | SELECT ... FROM dataset.table2 | +--------------+--------------+----------------------------------------------+ | bquxjob_3 | 2019-10-10 00:00:02 UTC | SELECT ... FROM dataset.table3 | +--------------+--------------+----------------------------------------------+ | bquxjob_4 | 2019-10-10 00:00:03 UTC | SELECT ... FROM dataset.table4 | +--------------+--------------+----------------------------------------------+ | bquxjob_5 | 2019-10-10 00:00:04 UTC | SELECT ... FROM dataset.table5 | +--------------+--------------+----------------------------------------------+
Next steps
- To see an overview of
INFORMATION_SCHEMA
, go to Introduction to BigQueryINFORMATION_SCHEMA
. - Learn how to use
INFORMATION_SCHEMA
to get resevations metadata. - Learn how to use
INFORMATION_SCHEMA
to get streaming metadata. - Learn how to use
INFORMATION_SCHEMA
to get dataset metadata. - Learn how to use
INFORMATION_SCHEMA
to get table metadata. - Learn how to use
INFORMATION_SCHEMA
to get view metadata.