Getting jobs metadata using INFORMATION_SCHEMA

INFORMATION_SCHEMA is a series of views that provide access to metadata about datasets, routines, tables, views, jobs, reservations, and streaming data.

You can query the INFORMATION_SCHEMA.JOBS_BY_* view to retrieve real-time metadata about BigQuery jobs. This view contains currently running jobs, as well as the history of jobs completed in the past 180 days.

Required permissions

Retrieving jobs metadata by using INFORMATION_SCHEMA tables requires appropriately scoped permissions:

  • JOBS_BY_USER requires bigquery.jobs.list for the project and is available to the Project Viewer and BigQuery User roles.
  • JOBS_BY_PROJECT requires bigquery.jobs.listAll for the project and is available to the Project Owner and BigQuery Admin roles.
  • JOBS_BY_FOLDER requires bigquery.jobs.listAll for the parent folder and is available to the Folder Admin and BigQuery Admin for the folder.
  • JOBS_BY_ORGANIZATION requires bigquery.jobs.listAll for the organization and is available to the Organization bigquery.resourceAdmin, Organization Owner and Organization Admin roles. Note that JOBS_BY_ORGANIZATION is only available to users with defined Google Cloud organizations.

For more information about granular BigQuery permissions, see roles and permissions.

Schema

When you query the INFORMATION_SCHEMA.JOBS_BY_* views, the query results contain one row for each BigQuery job.

  • INFORMATION_SCHEMA.JOBS_BY_USER returns only the jobs submitted by the current user in the current project.
  • INFORMATION_SCHEMA.JOBS_BY_PROJECT returns all jobs submitted in the current project.
  • INFORMATION_SCHEMA.JOBS_BY_FOLDER returns all jobs submitted in the parent folder of the current project, including the jobs in subfolders under it.
  • INFORMATION_SCHEMA.JOBS_BY_ORGANIZATION returns all jobs submitted in the organization that is associated with the current project.

The INFORMATION_SCHEMA.JOBS_BY_* view has the following schema:

Column name Data type Value
creation_time TIMESTAMP (Partitioning column) Creation time of this job. Partitioning is based on the UTC time of this timestamp.
project_id STRING (Clustering column) ID of the project.
project_number INTEGER Number of the project.
folder_numbers REPEATED INTEGER Google Accounts and ID Administration (GAIA) IDs of folders in a project's ancestry, in order starting with the leaf folder closest to the project. This column is only populated in JOBS_BY_FOLDER.
user_email STRING (Clustering column) Email address or service account of the user who ran the job.
job_id STRING ID of the job. For example, bquxjob_1234.
job_type STRING The type of the job. Can be QUERY, LOAD, EXTRACT, COPY, or null. Job type null indicates an internal job, such as script job statement evaluation or materialized view refresh.
statement_type STRING The type of query statement, if valid. For example, SELECT, INSERT, UPDATE, DELETE, or SCRIPT. See QueryStatementType for list of valid values.
priority STRING The priority of this job. Valid values include INTERACTIVE and BATCH.
start_time TIMESTAMP Start time of this job.
end_time TIMESTAMP End time of this job.
query STRING SQL query text. Note: Only JOBS_BY_PROJECT view has the query column.
state STRING Running state of the job. Valid states include PENDING, RUNNING, and DONE.
reservation_id STRING Name of the primary reservation assigned to this job, if applicable. If your job ran in a project that is assigned to a reservation, it would follow this format: RESERVATION_ADMIN_PROJECT:RESERVATION_LOCATION.RESERVATION_NAME

In this output:

  • RESERVATION_ADMIN_PROJECT: the name of the Google Cloud project that administers the reservation
  • RESERVATION_LOCATION: the location of the reservation
  • RESERVATION_NAME: the name of the reservation
total_bytes_processed INTEGER Total bytes processed by the job.
total_slot_ms INTEGER Slot milliseconds for the job over its entire duration.
error_result RECORD Details of error, if any, as an ErrorProto.
cache_hit BOOLEAN Whether the query results of this job were from a cache.
destination_table RECORD Destination table for results, if any.
referenced_tables RECORD Array of tables referenced by the job.
labels RECORD Array of labels applied to the job as key, value strings.
timeline RECORD Query timeline of the job. Contains snapshots of query execution.
job_stages RECORD Query stages of the job.
total_bytes_billed INTEGER If the project is configured to use on-demand pricing, then this field contains the total bytes billed for the job. If the project is configured to use flat-rate pricing, then you are not billed for bytes and this field is informational only.
parent_job_id STRING ID of the parent job, if any.
transaction_id STRING ID of the transaction in which this job ran, if any. (Preview)
dml_statistics RECORD

If the job is a query with a DML statement, the value is a record with the following fields:

  • inserted_row_count: The number of rows that were inserted.
  • deleted_row_count: The number of rows that were deleted.
  • updated_row_count: The number of rows that were updated.

For all other jobs, the value is NULL.

This column is present in the INFORMATION_SCHEMA.JOBS_BY_USER and INFORMATION_SCHEMA.JOBS_BY_PROJECT views.

Data retention

The job history of only the past 180 days is retained in INFORMATION_SCHEMA jobs views.

Remarks

When you query INFORMATION_SCHEMA.JOBS_BY_* to find a summary cost of query jobs, exclude the SCRIPT statement type, otherwise some values might be counted twice. The SCRIPT row includes summary values for all child jobs that were executed as part of this job.

Regionality

BigQuery's INFORMATION_SCHEMA jobs views are regionalized. To query these views, you must use a region qualifier.

Examples

Example 1: Average slot utilization

The following example calculates average slot utilization for all queries over the past 7 days for a given project. Note that this calculation is most accurate for projects that have consistent slot usage throughout the week. If your project does not have consistent slot usage, this number might be lower than expected.

To run the query against a project other than your default project, add the project ID in the following format:

`PROJECT_ID`.`region-REGION_NAME`.INFORMATION_SCHEMA.VIEW
Replace the following:

  • PROJECT_ID: the ID of the project
  • REGION_NAME: the region for your project
  • VIEW: the name of the view for which you want to run the query

For example, `myproject`.`region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT.

To run the query:

Console

  1. In the Cloud Console, open the BigQuery page.

    Go to the BigQuery page

  2. In the Query editor box, enter the following standard SQL query. INFORMATION_SCHEMA requires standard SQL syntax. Standard SQL is the default syntax in the Cloud Console.

    SELECT
     SUM(total_slot_ms) / (1000*60*60*24*7) AS avg_slots
    FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
    WHERE
     -- Filter by the partition column first to limit the amount of data scanned. Eight days
     -- allows for jobs created before the 7 day end_time filter.
     creation_time BETWEEN TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 8 DAY) AND CURRENT_TIMESTAMP()
     AND job_type = "QUERY"
     AND end_time BETWEEN TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY) AND CURRENT_TIMESTAMP()
    
  3. Click Run.

bq

Use the query command and specify standard SQL syntax by using the --nouse_legacy_sql or --use_legacy_sql=false flag. Standard SQL syntax is required for INFORMATION_SCHEMA queries.

To run the query, enter:

bq query --nouse_legacy_sql \
'SELECT
   SUM(total_slot_ms) / (1000*60*60*24*7) AS avg_slots
 FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
 WHERE
   -- Filter by the partition column first to limit the amount of data scanned. Eight days
   -- allows for jobs created before the 7 day end_time filter.
   creation_time BETWEEN TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 8 DAY) AND CURRENT_TIMESTAMP()
   AND job_type = "QUERY"
   AND statement_type != "SCRIPT"
   AND end_time BETWEEN TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY) AND CURRENT_TIMESTAMP()'

The results should look like the following:

  +------------+
  | avg_slots  |
  +------------+
  | 3879.1534  |
  +------------+
  

You can check usage for a particular reservation with WHERE reservation_id = "…". This can be helpful to determine percentage use of a reservation over a period of time. For script jobs, the parent job also reports the total slot usage from its children jobs. To avoid double counting, use WHERE statement_type != "SCRIPT" to exclude the parent job.

If instead you would like to check the average slot utilization for individual jobs, use total_slot_ms / TIMESTAMP_DIFF(end_time,start_time, MILLISECOND)

Example 2: Load job history

The following example lists all users or service accounts that submitted a batch load job for a given project. Because no time boundary is specified, this query scans all available history (for example, the last 30 days).

To run the query against a project other than your default project, add the project ID in the following format:

`PROJECT_ID`.`region-REGION_NAME`.INFORMATION_SCHEMA.VIEW
For example, `myproject`.`region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT.

To run the query:

Console

  1. In the Cloud Console, open the BigQuery page.

    Go to the BigQuery page

  2. Enter the following standard SQL query in the Query editor box. INFORMATION_SCHEMA requires standard SQL syntax. Standard SQL is the default syntax in the Cloud Console.

    SELECT
     DISTINCT(user_email) AS user
    FROM
     `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
    WHERE
     job_type = "LOAD"
    
  3. Click Run.

bq

Use the query command and specify standard SQL syntax by using the --nouse_legacy_sql or --use_legacy_sql=false flag. Standard SQL syntax is required for INFORMATION_SCHEMA queries.

To run the query, enter:

bq query --nouse_legacy_sql \
'SELECT
   DISTINCT(user_email) AS user
 FROM
   `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
 WHERE
   job_type = "LOAD"'

The results should look like the following:

  +--------------+
  | user         |
  +--------------+
  | abc@xyz.com  |
  +--------------+
  | def@xyz.com  |
  +--------------+
  

Example 3: Most expensive jobs

The following example demonstrates how to find the five jobs that scanned the most bytes in an organization for the current day. You can filter further on statement_type to query for additional information such as loads, exports, and queries.

To run the query against a project other than your default project, add the project ID in the following format:

`PROJECT_ID`.`region-REGION_NAME`.INFORMATION_SCHEMA.VIEW
For example, `myproject`.`region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT.

To run the query:

Console

  1. In the Cloud Console, open the BigQuery page.

    Go to the BigQuery page

  2. In the Query editor box, enter the following standard SQL query. INFORMATION_SCHEMA requires standard SQL syntax. Standard SQL is the default syntax in the Cloud Console.

    SELECT
     job_id,
     user_email,
     total_bytes_processed
    FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_ORGANIZATION
    WHERE EXTRACT(DATE FROM  creation_time) = current_date()
    ORDER BY total_bytes_processed DESC
    LIMIT 5
    
  3. Click Run.

bq

Use the query command and specify standard SQL syntax by using the --nouse_legacy_sql or --use_legacy_sql=false flag. Standard SQL syntax is required for INFORMATION_SCHEMA queries.

To run the query, enter:

bq query --nouse_legacy_sql \
'SELECT
   job_id,
   user_email,
   total_bytes_processed
 FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_ORGANIZATION
 WHERE EXTRACT(DATE FROM  creation_time) = current_date()
 ORDER BY total_bytes_processed DESC
 LIMIT 5'

The results should look like the following:

  +--------------+--------------+---------------------------+
  | job_id       |  user_email  |  total_bytes_processed    |
  +--------------+--------------+---------------------------+
  | bquxjob_1    |  abc@xyz.com |    999999                 |
  +--------------+--------------+---------------------------+
  | bquxjob_2    |  def@xyz.com |    888888                 |
  +--------------+--------------+---------------------------+
  | bquxjob_3    |  ghi@xyz.com |    777777                 |
  +--------------+--------------+---------------------------+
  | bquxjob_4    |  jkl@xyz.com |    666666                 |
  +--------------+--------------+---------------------------+
  | bquxjob_5    |  mno@xyz.com |    555555                 |
  +--------------+--------------+---------------------------+
  

Example 4: Pending and running jobs

The following example lists jobs that were started by the current user and are currently in the pending or running state.

To run the query against a project other than your default project, add the project ID in the following format:

`PROJECT_ID`.`region-REGION_NAME`.INFORMATION_SCHEMA.VIEW
For example, `myproject`.`region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT.

To run the query:

Console

  1. In the Cloud Console, open the BigQuery page.

    Go to the BigQuery page

  2. In the Query editor box, enter the following standard SQL query. INFORMATION_SCHEMA requires standard SQL syntax. Standard SQL is the default syntax in the Cloud Console.

    SELECT
     job_id,
     creation_time,
     query
    FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_USER
    WHERE state != "DONE"
    
  3. Click Run.

bq

Use the query command and specify standard SQL syntax by using the --nouse_legacy_sql or --use_legacy_sql=false flag. Standard SQL syntax is required for INFORMATION_SCHEMA queries.

To run the query, enter:

bq query --nouse_legacy_sql \
'SELECT
   job_id,
   creation_time,
   query
 FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_USER
 WHERE state != "DONE"'

The results should look like the following:

  +--------------+--------------+----------------------------------------------+
  | job_id       |  creation_time            |  query                          |
  +--------------+--------------+----------------------------------------------+
  | bquxjob_1    |  2019-10-10 00:00:00 UTC  |  SELECT ... FROM dataset.table1 |
  +--------------+--------------+----------------------------------------------+
  | bquxjob_2    |  2019-10-10 00:00:01 UTC  |  SELECT ... FROM dataset.table2 |
  +--------------+--------------+----------------------------------------------+
  | bquxjob_3    |  2019-10-10 00:00:02 UTC  |  SELECT ... FROM dataset.table3 |
  +--------------+--------------+----------------------------------------------+
  | bquxjob_4    |  2019-10-10 00:00:03 UTC  |  SELECT ... FROM dataset.table4 |
  +--------------+--------------+----------------------------------------------+
  | bquxjob_5    |  2019-10-10 00:00:04 UTC  |  SELECT ... FROM dataset.table5 |
  +--------------+--------------+----------------------------------------------+
  

Next steps