Getting jobs metadata using INFORMATION_SCHEMA
INFORMATION_SCHEMA
is a series of views that provide access to metadata
about datasets, routines, tables, views, jobs, reservations, and streaming data.
You can query the INFORMATION_SCHEMA.JOBS_BY_*
view to retrieve real-time
metadata about BigQuery jobs. This view contains currently
running jobs, as well as the history of jobs completed in the past 180 days.
Before you begin
Grant Identity and Access Management (IAM) roles that give users the necessary permissions to perform each task in this document.
Required permissions
To retrieve jobs metadata by using INFORMATION_SCHEMA
tables, you need the following IAM permissions:
For
JOBS_BY_USER
, you need the following:bigquery.jobs.list
for the project
Each of the following predefined IAM roles includes the preceding permissions:
Project Viewer
BigQuery User
For
JOBS_BY_PROJECT
, you need the following:bigquery.jobs.listAll
for the project
Each of the following predefined IAM roles includes the preceding permissions:
Project Owner
BigQuery Admin
For
JOBS_BY_FOLDER
, you need the following:bigquery.jobs.listAll
for the parent folder
Each of the following predefined IAM roles includes the preceding permissions:
Folder Admin
BigQuery Admin
For
JOBS_BY_ORGANIZATION
, you need the following:bigquery.jobs.listAll
for the organization
Each of the following predefined IAM roles includes the preceding permissions:
Organization bigquery.resourceAdmin
Organization Owner
Organization Admin
The
JOBS_BY_ORGANIZATION
schema table is only available to users with defined Google Cloud organizations.
For more information about granular BigQuery permissions, see roles and permissions.
Schema
When you query the INFORMATION_SCHEMA.JOBS_BY_*
views, the query results
contain one row for each BigQuery job.
INFORMATION_SCHEMA.JOBS_BY_USER
returns only the jobs submitted by the current user in the current project.INFORMATION_SCHEMA.JOBS_BY_PROJECT
returns all jobs submitted in the current project.INFORMATION_SCHEMA.JOBS_BY_FOLDER
returns all jobs submitted in the parent folder of the current project, including the jobs in subfolders under it.INFORMATION_SCHEMA.JOBS_BY_ORGANIZATION
returns all jobs submitted in the organization that is associated with the current project.
The INFORMATION_SCHEMA.JOBS_BY_*
view has the following schema:
Column name | Data type | Value |
---|---|---|
creation_time |
TIMESTAMP |
(Partitioning column) Creation time of this job. Partitioning is based on the UTC time of this timestamp. |
project_id |
STRING |
(Clustering column) ID of the project. |
project_number |
INTEGER |
Number of the project. |
folder_numbers |
REPEATED INTEGER |
Google Accounts and ID Administration (GAIA) IDs of folders in a
project's ancestry, in order starting with the leaf folder closest to
the project. This column is only populated in
JOBS_BY_FOLDER . |
user_email |
STRING |
(Clustering column) Email address or service account of the user who ran the job. |
job_id |
STRING |
ID of the job. For example, bquxjob_1234 . |
job_type |
STRING |
The type of the job. Can be QUERY , LOAD ,
EXTRACT , COPY , or null . Job
type null indicates an internal job, such as script job
statement evaluation or materialized view refresh. |
statement_type |
STRING |
The type of query statement, if valid. For example,
SELECT , INSERT , UPDATE ,
DELETE , or SCRIPT .
See QueryStatementType
for list of valid values. |
priority |
STRING |
The priority of this job. Valid values include INTERACTIVE
and BATCH . |
start_time |
TIMESTAMP |
Start time of this job. |
end_time |
TIMESTAMP |
End time of this job. |
query |
STRING |
SQL query text. Note: Only JOBS_BY_PROJECT view has
the query column. |
state |
STRING |
Running state of the job. Valid states include PENDING ,
RUNNING , and DONE . |
reservation_id |
STRING |
Name of the primary reservation assigned to this job, if
applicable. If your job ran in a project that is assigned to a
reservation, it would follow this format:
RESERVATION_ADMIN_PROJECT:RESERVATION_LOCATION.RESERVATION_NAME
In this output:
|
total_bytes_processed |
INTEGER |
Total bytes processed by the job. |
total_slot_ms |
INTEGER |
Slot milliseconds for the job over its entire duration. |
error_result |
RECORD |
Details of error, if any, as an ErrorProto . |
cache_hit |
BOOLEAN |
Whether the query results of this job were from a cache. |
destination_table |
RECORD |
Destination table for results, if any. |
referenced_tables |
RECORD |
Array of tables referenced by the job. Only populated for query jobs. |
labels |
RECORD |
Array of labels applied to the job as key , value strings. |
timeline |
RECORD |
Query timeline of the job. Contains snapshots of query execution. |
job_stages |
RECORD |
Query stages of the job. |
total_bytes_billed |
INTEGER |
If the project is configured to use on-demand pricing, then this field contains the total bytes billed for the job. If the project is configured to use flat-rate pricing, then you are not billed for bytes and this field is informational only. |
parent_job_id |
STRING |
ID of the parent job, if any. |
transaction_id |
STRING |
ID of the transaction in which this job ran, if any. (Preview) |
session_info |
RECORD |
Details about the session in which this job ran, if any. (Preview) |
dml_statistics |
RECORD |
If the job is a query with a DML statement, the value is a record with the following fields:
For all other jobs, the value is This column is present in the
|
bi_engine_statistics |
RECORD |
If the project is configured to use
BI Engine SQL Interface, then this field contains
BiEngineStatistics
. Otherwise NULL .
|
total_modified_partitions |
INTEGER |
Total number of partitions the job modified. This field is populated
for LOAD and QUERY jobs.
|
Data retention
The job history of only the past 180 days is retained in
INFORMATION_SCHEMA
jobs views.
Remarks
When you query INFORMATION_SCHEMA.JOBS_BY_*
to find a summary cost
of query jobs, exclude the SCRIPT
statement type,
otherwise some values might be counted twice. The SCRIPT
row includes
summary values for all child jobs that were executed as part of this job.
Regionality
BigQuery's INFORMATION_SCHEMA
jobs views are regionalized. To
query these views, you must use a
region qualifier.
Examples
Example 1: Average slot utilization
The following example calculates average slot utilization for all queries over the past 7 days for a given project. Note that this calculation is most accurate for projects that have consistent slot usage throughout the week. If your project does not have consistent slot usage, this number might be lower than expected.
To run the query against a project other than your default project, add the project ID in the following format:
`PROJECT_ID`.`region-REGION_NAME`.INFORMATION_SCHEMA.VIEWReplace the following:
PROJECT_ID
: the ID of the projectREGION_NAME
: the region for your projectVIEW
: the name of the view for which you want to run the query
For example, `myproject`.`region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
.
To run the query:
Console
In the Google Cloud console, open the BigQuery page.
In the Query editor box, enter the following standard SQL query.
INFORMATION_SCHEMA
requires standard SQL syntax. Standard SQL is the default syntax in the Google Cloud console.SELECT SUM(total_slot_ms) / (1000*60*60*24*7) AS avg_slots FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT WHERE -- Filter by the partition column first to limit the amount of data scanned. Eight days -- allows for jobs created before the 7 day end_time filter. creation_time BETWEEN TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 8 DAY) AND CURRENT_TIMESTAMP() AND job_type = "QUERY" AND end_time BETWEEN TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY) AND CURRENT_TIMESTAMP()
Click Run.
bq
Use the query
command and specify standard SQL syntax by using the
--nouse_legacy_sql
or --use_legacy_sql=false
flag. Standard SQL syntax
is required for INFORMATION_SCHEMA
queries.
To run the query, enter:
bq query --nouse_legacy_sql \ 'SELECT SUM(total_slot_ms) / (1000*60*60*24*7) AS avg_slots FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT WHERE -- Filter by the partition column first to limit the amount of data scanned. Eight days -- allows for jobs created before the 7 day end_time filter. creation_time BETWEEN TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 8 DAY) AND CURRENT_TIMESTAMP() AND job_type = "QUERY" AND statement_type != "SCRIPT" AND end_time BETWEEN TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY) AND CURRENT_TIMESTAMP()'
The results should look like the following:
+------------+ | avg_slots | +------------+ | 3879.1534 | +------------+
You can check usage for a particular reservation with
WHERE reservation_id = "…"
. This can be helpful to determine percentage use
of a reservation over a period of time. For script jobs, the parent job also
reports the total slot usage from its children jobs. To avoid double counting,
use WHERE statement_type != "SCRIPT"
to exclude the parent job.
If instead you would like to check the average slot utilization for individual
jobs, use total_slot_ms / TIMESTAMP_DIFF(end_time,start_time, MILLISECOND)
Example 2: Load job history
The following example lists all users or service accounts that submitted a batch load job for a given project. Because no time boundary is specified, this query scans all available history (for example, the last 30 days).
To run the query against a project other than your default project, add the project ID in the following format:
`PROJECT_ID`.`region-REGION_NAME`.INFORMATION_SCHEMA.VIEWFor example,
`myproject`.`region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
.
To run the query:
Console
In the Google Cloud console, open the BigQuery page.
Enter the following standard SQL query in the Query editor box.
INFORMATION_SCHEMA
requires standard SQL syntax. Standard SQL is the default syntax in the Google Cloud console.SELECT DISTINCT(user_email) AS user FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT WHERE job_type = "LOAD"
Click Run.
bq
Use the query
command and specify standard SQL syntax by using the
--nouse_legacy_sql
or --use_legacy_sql=false
flag. Standard SQL syntax
is required for INFORMATION_SCHEMA
queries.
To run the query, enter:
bq query --nouse_legacy_sql \ 'SELECT DISTINCT(user_email) AS user FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT WHERE job_type = "LOAD"'
The results should look like the following:
+--------------+ | user | +--------------+ | abc@xyz.com | +--------------+ | def@xyz.com | +--------------+
Example 3: Most expensive jobs
The following example demonstrates how to find the five jobs that scanned the
most bytes in an organization for the current day. You can filter further on
statement_type
to query for additional information such as loads, exports,
and queries.
To run the query against a project other than your default project, add the project ID in the following format:
`PROJECT_ID`.`region-REGION_NAME`.INFORMATION_SCHEMA.VIEWFor example,
`myproject`.`region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
.
To run the query:
Console
In the Google Cloud console, open the BigQuery page.
In the Query editor box, enter the following standard SQL query.
INFORMATION_SCHEMA
requires standard SQL syntax. Standard SQL is the default syntax in the Google Cloud console.SELECT job_id, user_email, total_bytes_billed FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_ORGANIZATION WHERE EXTRACT(DATE FROM creation_time) = current_date() ORDER BY total_bytes_billed DESC LIMIT 5
Click Run.
bq
Use the query
command and specify standard SQL syntax by using the
--nouse_legacy_sql
or --use_legacy_sql=false
flag. Standard SQL syntax
is required for INFORMATION_SCHEMA
queries.
To run the query, enter:
bq query --nouse_legacy_sql \ 'SELECT job_id, user_email, total_bytes_billed FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_ORGANIZATION WHERE EXTRACT(DATE FROM creation_time) = current_date() ORDER BY total_bytes_billed DESC LIMIT 5'
The results should look like the following:
+--------------+--------------+---------------------------+ | job_id | user_email | total_bytes_billed | +--------------+--------------+---------------------------+ | bquxjob_1 | abc@xyz.com | 999999 | +--------------+--------------+---------------------------+ | bquxjob_2 | def@xyz.com | 888888 | +--------------+--------------+---------------------------+ | bquxjob_3 | ghi@xyz.com | 777777 | +--------------+--------------+---------------------------+ | bquxjob_4 | jkl@xyz.com | 666666 | +--------------+--------------+---------------------------+ | bquxjob_5 | mno@xyz.com | 555555 | +--------------+--------------+---------------------------+
Example 4: Pending and running jobs
The following example lists jobs that were started by the current user and are currently in the pending or running state.
To run the query against a project other than your default project, add the project ID in the following format:
`PROJECT_ID`.`region-REGION_NAME`.INFORMATION_SCHEMA.VIEWFor example,
`myproject`.`region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
.
To run the query:
Console
In the Google Cloud console, open the BigQuery page.
In the Query editor box, enter the following standard SQL query.
INFORMATION_SCHEMA
requires standard SQL syntax. Standard SQL is the default syntax in the Google Cloud console.SELECT job_id, creation_time, query FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_USER WHERE state != "DONE"
Click Run.
bq
Use the query
command and specify standard SQL syntax by using the
--nouse_legacy_sql
or --use_legacy_sql=false
flag. Standard SQL syntax
is required for INFORMATION_SCHEMA
queries.
To run the query, enter:
bq query --nouse_legacy_sql \ 'SELECT job_id, creation_time, query FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_USER WHERE state != "DONE"'
The results should look like the following:
+--------------+--------------+----------------------------------------------+ | job_id | creation_time | query | +--------------+--------------+----------------------------------------------+ | bquxjob_1 | 2019-10-10 00:00:00 UTC | SELECT ... FROM dataset.table1 | +--------------+--------------+----------------------------------------------+ | bquxjob_2 | 2019-10-10 00:00:01 UTC | SELECT ... FROM dataset.table2 | +--------------+--------------+----------------------------------------------+ | bquxjob_3 | 2019-10-10 00:00:02 UTC | SELECT ... FROM dataset.table3 | +--------------+--------------+----------------------------------------------+ | bquxjob_4 | 2019-10-10 00:00:03 UTC | SELECT ... FROM dataset.table4 | +--------------+--------------+----------------------------------------------+ | bquxjob_5 | 2019-10-10 00:00:04 UTC | SELECT ... FROM dataset.table5 | +--------------+--------------+----------------------------------------------+
Example 5: Bytes processed per user identity
The following example shows the total bytes billed for query jobs per user.
SELECT
user_email,
SUM(total_bytes_billed) AS bytes_billed
FROM
`region-us.INFORMATION_SCHEMA.JOBS_BY_PROJECT`
WHERE
job_type = 'QUERY'
AND statement_type != 'SCRIPT'
GROUP BY
user_email;
The results should look like the following:
+---------------------+--------------+
| user_email | bytes_billed |
+---------------------+--------------+
| bob@example.com | 2847932416 |
| alice@example.com | 1184890880 |
| charles@example.com | 10485760 |
+---------------------+--------------+
Example 6: Hourly breakdown of bytes processed
The following example shows total bytes billed for query jobs, in hourly intervals.
SELECT
TIMESTAMP_TRUNC(end_time, HOUR) AS time_window,
SUM(total_bytes_billed) AS bytes_billed
FROM
`region-us.INFORMATION_SCHEMA.JOBS_BY_PROJECT`
WHERE
job_type = 'QUERY'
AND statement_type != 'SCRIPT'
GROUP BY
time_window
ORDER BY
time_window DESC
The results should look like the following:
+-------------------------+--------------+
| time_window | bytes_billed |
+-------------------------+--------------+
| 2022-05-17 20:00:00 UTC | 1967128576 |
| 2022-05-10 21:00:00 UTC | 0 |
| 2022-04-15 20:00:00 UTC | 10485760 |
| 2022-04-15 17:00:00 UTC | 41943040 |
+-------------------------+--------------+
What's next
- To see an overview of
INFORMATION_SCHEMA
, go to Introduction to BigQueryINFORMATION_SCHEMA
. - Learn how to use
INFORMATION_SCHEMA
to get reservations metadata. - Learn how to use
INFORMATION_SCHEMA
to get streaming metadata. - Learn how to use
INFORMATION_SCHEMA
to get dataset metadata. - Learn how to use
INFORMATION_SCHEMA
to get table metadata. - Learn how to use
INFORMATION_SCHEMA
to get view metadata.