Getting jobs metadata using INFORMATION_SCHEMA

INFORMATION_SCHEMA is a series of views that provide access to metadata about datasets, routines, tables, views, jobs, and reservations.

You can query the INFORMATION_SCHEMA.JOBS_BY_* view to retrieve real-time metadata about BigQuery jobs. This view contains currently running jobs, as well as the last 180 days of history of completed jobs.

Required permissions

Retrieving jobs metadata by using INFORMATION_SCHEMA tables requires appropriately-scoped permissions:

  • JOBS_BY_USER requires bigquery.jobs.list for the project and is available to the Project Viewer and BigQuery User roles.
  • JOBS_BY_PROJECT requires bigquery.jobs.listAll for the project and is available to the Project Owner and BigQuery Admin roles.
  • JOBS_BY_ORGANIZATION requires bigquery.jobs.listAll for the organization and is available to the Organization Owner and Organization Admin roles. Note that JOBS_BY_ORGANIZATION is only available to users with defined Google Cloud organizations.

Schema

When you query the INFORMATION_SCHEMA.JOBS_BY_* views, the query results contain one row for each BigQuery job.

  • INFORMATION_SCHEMA.JOBS_BY_USER returns only the jobs submitted by the current user in the current project.
  • INFORMATION_SCHEMA.JOBS_BY_PROJECT returns all jobs submitted in the current project.
  • INFORMATION_SCHEMA.JOBS_BY_ORGANIZATION returns all jobs submitted in the organization that is associated with the current project.

The INFORMATION_SCHEMA.JOBS_BY_* view has the following schema:

Column name Data type Value
creation_time TIMESTAMP (Partitioning column) Creation time of this job
project_id STRING (Clustering column) ID of the project.
project_number INTEGER Number of the project
user_email STRING (Clustering column) Email address or service account of the user who ran the job.
job_id STRING ID of the job. For example, `bquxjob_1234`
job_type STRING The type of the job. Can be QUERY, LOAD, EXTRACT, COPY, or UNKNOWN.
statement_type STRING The type of query statement, if valid. For example, SELECT, INSERT, UPDATE, or DELETE.
start_time TIMESTAMP Start time of this job
end_time TIMESTAMP End time of this job
query STRING SQL query text. Note: The JOBS_BY_ORGANIZATION view does not have the query column.
state STRING Running state of the job. Valid states include PENDING, RUNNING, and DONE.
reservation_id STRING Name of the primary reservation assigned to this job, if applicable.
total_bytes_processed INTEGER Total bytes processed by the job
total_slot_ms INTEGER Slot-milliseconds for the job over its entire duration.
error_result RECORD Details of error (if any).
cache_hit BOOLEAN Whether the query results were cached.
destination_table RECORD Destination table for results (if any).
referenced_tables RECORD Array of tables referenced by the job.
labels RECORD Array of labels applied to the job.
timeline RECORD Query timeline of the job. Contains snapshots of query execution.
job_stages RECORD Query stages of the job.

Data retention

Currently, only the last 180 days of job history is retained in INFORMATION_SCHEMA jobs views.

Regionality

BigQuery's INFORMATION_SCHEMA jobs views are regionalized. To query these views, you must prefix a relevant region name of the format: `region-region_name`.INFORMATION_SCHEMA.view.

For example:

  • To query data in the US multi-region, use `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
  • To query data in the EU multi-region, use `region-eu`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
  • To query data in the asia-northeast1 region, use `region-asia-northeast1`.INFORMATION_SCHEMA.JOBS_BY_PROJECT

You can find the list of available regions here.

Examples

Example 1: Average slot utilization

The following example calculates average slot utilization for all queries over the past 7 days for a given project.

To run the query against a project other than your default project, add the project ID in the following format: `project_id`.`region-region_name`.INFORMATION_SCHEMA.view; for example, `myproject`.`region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT.

To run the query:

Console

  1. In the Cloud Console, open the BigQuery web UI.

    Go to the Cloud Console

  2. In the Query editor box, enter the following standard SQL query. INFORMATION_SCHEMA requires standard SQL syntax. Standard SQL is the default syntax in the Cloud Console.

    SELECT
     SUM(total_slot_ms) / (1000*60*60*24*7) AS avg_slots
    FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
    WHERE
     job_type = "QUERY"
     AND end_time BETWEEN TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY) AND CURRENT_TIMESTAMP()
    
  3. Click Run.

CLI

Use the query command and specify standard SQL syntax by using the --nouse_legacy_sql or --use_legacy_sql=false flag. Standard SQL syntax is required for INFORMATION_SCHEMA queries.

To run the query, enter:

bq query --nouse_legacy_sql \
'SELECT
   SUM(total_slot_ms) / (1000*60*60*24*7) AS avg_slots
 FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
 WHERE
   job_type = "QUERY"
   AND end_time BETWEEN TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY) AND CURRENT_TIMESTAMP()'

The results should look like the following:

  +------------+
  | avg_slots  |
  +------------+
  | 3879.1534  |
  +------------+
  

You can check usage for a particular reservation with WHERE reservation_id = "…". This can be helpful to determine percentage use of a reservation over a period of time.

Example 2: Load job history

The following example lists all users or service accounts that submitted a batch load job for a given project. Because no time boundary is specified, this query scans all available history (for example, the last 30 days).

To run the query against a project other than your default project, add the project ID in the following format: `project_id`.`region-region_name`.INFORMATION_SCHEMA.view; for example, `myproject`.`region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT.

To run the query:

Console

  1. In the Cloud Console, open the BigQuery web UI.

    Go to the Cloud Console

  2. Enter the following standard SQL query in the Query editor box. INFORMATION_SCHEMA requires standard SQL syntax. Standard SQL is the default syntax in the Cloud Console.

    SELECT
     DISTINCT(user_email) AS user
    FROM
     `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
    WHERE
     job_type = "LOAD"
    
  3. Click Run.

CLI

Use the query command and specify standard SQL syntax by using the --nouse_legacy_sql or --use_legacy_sql=false flag. Standard SQL syntax is required for INFORMATION_SCHEMA queries.

To run the query, enter:

bq query --nouse_legacy_sql \
'SELECT
   DISTINCT(user_email) AS user
 FROM
   `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
 WHERE
   job_type = "LOAD"'

The results should look like the following:

  +--------------+
  | user         |
  +--------------+
  | abc@xyz.com  |
  +--------------+
  | def@xyz.com  |
  +--------------+
  

Example 3: Most expensive jobs

The following example demonstrates how to find the five jobs that scanned the most bytes in an organization for the current day. You can filter further on statement_type to query for additional information such as loads, exports, and queries.

To run the query against a project other than your default project, add the project ID in the following format: `project_id`.`region-region_name`.INFORMATION_SCHEMA.view; for example, `myproject`.`region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT.

To run the query:

Console

  1. In the Cloud Console, open the BigQuery web UI.

    Go to the Cloud Console

  2. In the Query editor box, enter the following standard SQL query. INFORMATION_SCHEMA requires standard SQL syntax. Standard SQL is the default syntax in the Cloud Console.

    SELECT
     job_id,
     user_email,
     total_bytes_processed
    FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_ORGANIZATION
    WHERE EXTRACT(DATE FROM  creation_time) = current_date()
    ORDER BY total_bytes_processed DESC
    LIMIT 5
    
  3. Click Run.

CLI

Use the query command and specify standard SQL syntax by using the --nouse_legacy_sql or --use_legacy_sql=false flag. Standard SQL syntax is required for INFORMATION_SCHEMA queries.

To run the query, enter:

bq query --nouse_legacy_sql \
'SELECT
   job_id,
   user_email,
   total_bytes_processed
 FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_ORGANIZATION
 WHERE EXTRACT(DATE FROM  creation_time) = current_date()
 ORDER BY total_bytes_processed DESC
 LIMIT 5'

The results should look like the following:

  +--------------+--------------+---------------------------+
  | job_id       |  user_email  |  total_bytes_processed    |
  +--------------+--------------+---------------------------+
  | bquxjob_1    |  abc@xyz.com |    999999                 |
  +--------------+--------------+---------------------------+
  | bquxjob_2    |  def@xyz.com |    888888                 |
  +--------------+--------------+---------------------------+
  | bquxjob_3    |  ghi@xyz.com |    777777                 |
  +--------------+--------------+---------------------------+
  | bquxjob_4    |  jkl@xyz.com |    666666                 |
  +--------------+--------------+---------------------------+
  | bquxjob_5    |  mno@xyz.com |    555555                 |
  +--------------+--------------+---------------------------+
  

Example 4: Pending and running jobs

The following example lists jobs that were started by the current user and are currently in the pending or running state.

To run the query against a project other than your default project, add the project ID in the following format: `project_id`.`region-region_name`.INFORMATION_SCHEMA.view; for example, `myproject`.`region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT.

To run the query:

Console

  1. In the Cloud Console, open the BigQuery web UI.

    Go to the Cloud Console

  2. In the Query editor box, enter the following standard SQL query. INFORMATION_SCHEMA requires standard SQL syntax. Standard SQL is the default syntax in the Cloud Console.

    SELECT
     job_id,
     creation_time,
     query
    FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_USER
    WHERE state != "DONE"
    
  3. Click Run.

CLI

Use the query command and specify standard SQL syntax by using the --nouse_legacy_sql or --use_legacy_sql=false flag. Standard SQL syntax is required for INFORMATION_SCHEMA queries.

To run the query, enter:

bq query --nouse_legacy_sql \
'SELECT
   job_id,
   creation_time,
   query
 FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_USER
 WHERE state != "DONE"'

The results should look like the following:

  +--------------+--------------+----------------------------------------------+
  | job_id       |  creation_time            |  query                          |
  +--------------+--------------+----------------------------------------------+
  | bquxjob_1    |  2019-10-10 00:00:00 UTC  |  SELECT ... FROM dataset.table1 |
  +--------------+--------------+----------------------------------------------+
  | bquxjob_2    |  2019-10-10 00:00:01 UTC  |  SELECT ... FROM dataset.table2 |
  +--------------+--------------+----------------------------------------------+
  | bquxjob_3    |  2019-10-10 00:00:02 UTC  |  SELECT ... FROM dataset.table3 |
  +--------------+--------------+----------------------------------------------+
  | bquxjob_4    |  2019-10-10 00:00:03 UTC  |  SELECT ... FROM dataset.table4 |
  +--------------+--------------+----------------------------------------------+
  | bquxjob_5    |  2019-10-10 00:00:04 UTC  |  SELECT ... FROM dataset.table5 |
  +--------------+--------------+----------------------------------------------+
  

Next steps