Getting jobs metadata using INFORMATION_SCHEMA

INFORMATION_SCHEMA is a series of views that provide access to metadata about datasets, routines, tables, views, jobs, reservations, and streaming data.

You can query the INFORMATION_SCHEMA.JOBS_BY_* view to retrieve real-time metadata about BigQuery jobs. This view contains currently running jobs, as well as the last 180 days of history of completed jobs.

Required permissions

Retrieving jobs metadata by using INFORMATION_SCHEMA tables requires appropriately-scoped permissions:

  • JOBS_BY_USER requires bigquery.jobs.list for the project and is available to the Project Viewer and BigQuery User roles.
  • JOBS_BY_PROJECT requires bigquery.jobs.listAll for the project and is available to the Project Owner and BigQuery Admin roles.
  • JOBS_BY_ORGANIZATION requires bigquery.jobs.listAll for the organization and is available to the Organization bigquery.resourceAdmin, Organization Owner and Organization Admin roles. Note that JOBS_BY_ORGANIZATION is only available to users with defined Google Cloud organizations.

For more information about granular BigQuery permissions, see roles and permissions.

Schema

When you query the INFORMATION_SCHEMA.JOBS_BY_* views, the query results contain one row for each BigQuery job.

  • INFORMATION_SCHEMA.JOBS_BY_USER returns only the jobs submitted by the current user in the current project.
  • INFORMATION_SCHEMA.JOBS_BY_PROJECT returns all jobs submitted in the current project.
  • INFORMATION_SCHEMA.JOBS_BY_ORGANIZATION returns all jobs submitted in the organization that is associated with the current project.

The INFORMATION_SCHEMA.JOBS_BY_* view has the following schema:

Column name Data type Value
creation_time TIMESTAMP (Partitioning column) Creation time of this job. Partitioning is based on the UTC time of this timestamp.
project_id STRING (Clustering column) ID of the project.
project_number INTEGER Number of the project.
user_email STRING (Clustering column) Email address or service account of the user who ran the job.
job_id STRING ID of the job. For example, `bquxjob_1234`.
job_type STRING The type of the job. Can be QUERY, LOAD, EXTRACT, COPY, or null. Job type null indicates an internal job, such as script job statement evaluation or materialized view refresh.
statement_type STRING The type of query statement, if valid. For example, SELECT, INSERT, UPDATE, or DELETE.
start_time TIMESTAMP Start time of this job.
end_time TIMESTAMP End time of this job.
query STRING SQL query text. Note: The JOBS_BY_ORGANIZATION view does not have the query column.
state STRING Running state of the job. Valid states include PENDING, RUNNING, and DONE.
reservation_id STRING Name of the primary reservation assigned to this job, if applicable.
total_bytes_processed INTEGER Total bytes processed by the job.
total_slot_ms INTEGER Slot-milliseconds for the job over its entire duration.
error_result RECORD Details of error (if any) as an ErrorProto.
cache_hit BOOLEAN Whether the query results were cached.
destination_table RECORD Destination table for results (if any).
referenced_tables RECORD Array of tables referenced by the job.
labels RECORD Array of labels applied to the job as key, value strings.
timeline RECORD Query timeline of the job. Contains snapshots of query execution.
job_stages RECORD Query stages of the job.

Data retention

Currently, only the last 180 days of job history is retained in INFORMATION_SCHEMA jobs views.

Regionality

BigQuery's INFORMATION_SCHEMA jobs views are regionalized. To query these views, you must use a region qualifier.

Examples

Example 1: Average slot utilization

The following example calculates average slot utilization for all queries over the past 7 days for a given project. Note that this calculation is most accurate for projects that have consistent slot usage throughout the week. If your project does not have consistent slot usage, this number might be lower than expected.

To run the query against a project other than your default project, add the project ID in the following format: `project_id`.`region-region_name`.INFORMATION_SCHEMA.view; for example, `myproject`.`region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT.

To run the query:

Console

  1. In the Cloud Console, open the BigQuery web UI.

    Go to the Cloud Console

  2. In the Query editor box, enter the following standard SQL query. INFORMATION_SCHEMA requires standard SQL syntax. Standard SQL is the default syntax in the Cloud Console.

    SELECT
     SUM(total_slot_ms) / (1000*60*60*24*7) AS avg_slots
    FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
    WHERE
     -- Filter by the partition column first to limit the amount of data scanned. Eight days
     -- allows for jobs created before the 7 day end_time filter.
     creation_time BETWEEN TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 8 DAY) AND CURRENT_TIMESTAMP()
     AND job_type = "QUERY"
     AND end_time BETWEEN TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY) AND CURRENT_TIMESTAMP()
    
  3. Click Run.

bq

Use the query command and specify standard SQL syntax by using the --nouse_legacy_sql or --use_legacy_sql=false flag. Standard SQL syntax is required for INFORMATION_SCHEMA queries.

To run the query, enter:

bq query --nouse_legacy_sql \
'SELECT
   SUM(total_slot_ms) / (1000*60*60*24*7) AS avg_slots
 FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
 WHERE
   -- Filter by the partition column first to limit the amount of data scanned. Eight days
   -- allows for jobs created before the 7 day end_time filter.
   creation_time BETWEEN TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 8 DAY) AND CURRENT_TIMESTAMP()
   AND job_type = "QUERY"
   AND end_time BETWEEN TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY) AND CURRENT_TIMESTAMP()'

The results should look like the following:

  +------------+
  | avg_slots  |
  +------------+
  | 3879.1534  |
  +------------+
  

You can check usage for a particular reservation with WHERE reservation_id = "…". This can be helpful to determine percentage use of a reservation over a period of time.

If instead you would like to check the average slot utilization for individual jobs, use total_slot_ms / TIMESTAMP_DIFF(end_time,start_time, MILLISECOND)

Example 2: Load job history

The following example lists all users or service accounts that submitted a batch load job for a given project. Because no time boundary is specified, this query scans all available history (for example, the last 30 days).

To run the query against a project other than your default project, add the project ID in the following format: `project_id`.`region-region_name`.INFORMATION_SCHEMA.view; for example, `myproject`.`region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT.

To run the query:

Console

  1. In the Cloud Console, open the BigQuery web UI.

    Go to the Cloud Console

  2. Enter the following standard SQL query in the Query editor box. INFORMATION_SCHEMA requires standard SQL syntax. Standard SQL is the default syntax in the Cloud Console.

    SELECT
     DISTINCT(user_email) AS user
    FROM
     `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
    WHERE
     job_type = "LOAD"
    
  3. Click Run.

bq

Use the query command and specify standard SQL syntax by using the --nouse_legacy_sql or --use_legacy_sql=false flag. Standard SQL syntax is required for INFORMATION_SCHEMA queries.

To run the query, enter:

bq query --nouse_legacy_sql \
'SELECT
   DISTINCT(user_email) AS user
 FROM
   `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
 WHERE
   job_type = "LOAD"'

The results should look like the following:

  +--------------+
  | user         |
  +--------------+
  | abc@xyz.com  |
  +--------------+
  | def@xyz.com  |
  +--------------+
  

Example 3: Most expensive jobs

The following example demonstrates how to find the five jobs that scanned the most bytes in an organization for the current day. You can filter further on statement_type to query for additional information such as loads, exports, and queries.

To run the query against a project other than your default project, add the project ID in the following format: `project_id`.`region-region_name`.INFORMATION_SCHEMA.view; for example, `myproject`.`region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT.

To run the query:

Console

  1. In the Cloud Console, open the BigQuery web UI.

    Go to the Cloud Console

  2. In the Query editor box, enter the following standard SQL query. INFORMATION_SCHEMA requires standard SQL syntax. Standard SQL is the default syntax in the Cloud Console.

    SELECT
     job_id,
     user_email,
     total_bytes_processed
    FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_ORGANIZATION
    WHERE EXTRACT(DATE FROM  creation_time) = current_date()
    ORDER BY total_bytes_processed DESC
    LIMIT 5
    
  3. Click Run.

bq

Use the query command and specify standard SQL syntax by using the --nouse_legacy_sql or --use_legacy_sql=false flag. Standard SQL syntax is required for INFORMATION_SCHEMA queries.

To run the query, enter:

bq query --nouse_legacy_sql \
'SELECT
   job_id,
   user_email,
   total_bytes_processed
 FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_ORGANIZATION
 WHERE EXTRACT(DATE FROM  creation_time) = current_date()
 ORDER BY total_bytes_processed DESC
 LIMIT 5'

The results should look like the following:

  +--------------+--------------+---------------------------+
  | job_id       |  user_email  |  total_bytes_processed    |
  +--------------+--------------+---------------------------+
  | bquxjob_1    |  abc@xyz.com |    999999                 |
  +--------------+--------------+---------------------------+
  | bquxjob_2    |  def@xyz.com |    888888                 |
  +--------------+--------------+---------------------------+
  | bquxjob_3    |  ghi@xyz.com |    777777                 |
  +--------------+--------------+---------------------------+
  | bquxjob_4    |  jkl@xyz.com |    666666                 |
  +--------------+--------------+---------------------------+
  | bquxjob_5    |  mno@xyz.com |    555555                 |
  +--------------+--------------+---------------------------+
  

Example 4: Pending and running jobs

The following example lists jobs that were started by the current user and are currently in the pending or running state.

To run the query against a project other than your default project, add the project ID in the following format: `project_id`.`region-region_name`.INFORMATION_SCHEMA.view; for example, `myproject`.`region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT.

To run the query:

Console

  1. In the Cloud Console, open the BigQuery web UI.

    Go to the Cloud Console

  2. In the Query editor box, enter the following standard SQL query. INFORMATION_SCHEMA requires standard SQL syntax. Standard SQL is the default syntax in the Cloud Console.

    SELECT
     job_id,
     creation_time,
     query
    FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_USER
    WHERE state != "DONE"
    
  3. Click Run.

bq

Use the query command and specify standard SQL syntax by using the --nouse_legacy_sql or --use_legacy_sql=false flag. Standard SQL syntax is required for INFORMATION_SCHEMA queries.

To run the query, enter:

bq query --nouse_legacy_sql \
'SELECT
   job_id,
   creation_time,
   query
 FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_USER
 WHERE state != "DONE"'

The results should look like the following:

  +--------------+--------------+----------------------------------------------+
  | job_id       |  creation_time            |  query                          |
  +--------------+--------------+----------------------------------------------+
  | bquxjob_1    |  2019-10-10 00:00:00 UTC  |  SELECT ... FROM dataset.table1 |
  +--------------+--------------+----------------------------------------------+
  | bquxjob_2    |  2019-10-10 00:00:01 UTC  |  SELECT ... FROM dataset.table2 |
  +--------------+--------------+----------------------------------------------+
  | bquxjob_3    |  2019-10-10 00:00:02 UTC  |  SELECT ... FROM dataset.table3 |
  +--------------+--------------+----------------------------------------------+
  | bquxjob_4    |  2019-10-10 00:00:03 UTC  |  SELECT ... FROM dataset.table4 |
  +--------------+--------------+----------------------------------------------+
  | bquxjob_5    |  2019-10-10 00:00:04 UTC  |  SELECT ... FROM dataset.table5 |
  +--------------+--------------+----------------------------------------------+
  

Next steps