Getting job metadata by timeslice using INFORMATION_SCHEMA

INFORMATION_SCHEMA is a series of views that provide access to metadata about datasets, routines, tables, views, jobs, reservations, and streaming data.

You can query the INFORMATION_SCHEMA.JOBS_TIMELINE_BY_* views to retrieve real-time BigQuery metadata by timeslice. This view contains currently running and completed jobs. Data is retained for 180 days.

Required permissions

Retrieving jobs metadata by using INFORMATION_SCHEMA tables requires appropriately scoped permissions:

  • JOBS_TIMELINE_BY_USER requires bigquery.jobs.list for the project and is available to the Project Viewer and BigQuery User roles.
  • JOBS_TIMELINE_BY_PROJECT requires bigquery.jobs.listAll for the project and is available to the Project Owner and BigQuery Admin roles.
  • JOBS_TIMELINE_BY_FOLDER requires bigquery.jobs.listAll for the parent folder and is available to the Folder Admin and BigQuery Admin for the folder.
  • JOBS_TIMELINE_BY_ORGANIZATION requires bigquery.jobs.listAll for the organization and is available to the Organization bigquery.resourceAdmin, Organization Owner, and Organization Admin roles. Note that JOBS_BY_ORGANIZATION is only available to users with defined Google Cloud organizations.

For more information about granular BigQuery permissions, see roles and permissions.

Schema

When you query the INFORMATION_SCHEMA.JOBS_TIMELINE_BY_* views, the query results contain one row for every second of execution of every BigQuery job. Each period starts on a whole-second interval and lasts exactly one second.

  • INFORMATION_SCHEMA.JOBS_TIMELINE_BY_USER returns only the jobs submitted by the current user in the current project.
  • INFORMATION_SCHEMA.JOBS_TIMELINE_BY_PROJECT returns all jobs submitted in the current project.
  • INFORMATION_SCHEMA.JOBS_TIMELINE_BY_FOLDER returns all jobs submitted in the parent folder of the current project, including the jobs in subfolders under it.
  • INFORMATION_SCHEMA.JOBS_TIMELINE_BY_ORGANIZATION returns all jobs submitted in the parent folder of the current project, including the jobs in subfolders under it.

The INFORMATION_SCHEMA.JOBS_TIMELINE_BY_* view has the following schema:

Column name Data type Value
period_start TIMESTAMP Start time of this period.
period_slot_ms INTEGER Slot milliseconds consumed in this period.
project_id STRING (Clustering column) ID of the project.
project_number INTEGER Number of the project.
user_email STRING (Clustering column) Email address or service account of the user who ran the job.
job_id STRING ID of the job. For example, bquxjob_1234.
job_type STRING The type of the job. Can be QUERY, LOAD, EXTRACT, COPY, or null. Job type null indicates an internal job, such as script job statement evaluation or materialized view refresh.
statement_type STRING The type of query statement, if valid. For example, SELECT, INSERT, UPDATE, or DELETE.
job_creation_time TIMESTAMP (Partitioning column) Creation time of this job. Partitioning is based on the UTC time of this timestamp.
job_start_time TIMESTAMP Start time of this job.
job_end_time TIMESTAMP End time of this job.
state STRING Running state of the job at the end of this period. Valid states include PENDING, RUNNING, and DONE.
reservation_id STRING Name of the primary reservation assigned to this job at the end of this period, if applicable.
total_bytes_processed INTEGER Total bytes processed by the job.
error_result RECORD Details of error (if any) as an ErrorProto.
cache_hit BOOLEAN Whether the query results of this job were from a cache.

Data retention

Data in the INFORMATION_SCHEMA jobs timeline views will be available from sometime in the first week of August 2020 and will be retained for 180 days.

Regionality

BigQuery's INFORMATION_SCHEMA jobs timeline views are regionalized. To query these views, you must use a region qualifier.

Examples

Example 1: Time series of slot utilization

The following example calculates the slot utilization for every second in the last day.

To run the query against a project other than your default project, add the project ID in the following format: `PROJECT_ID`.`region-REGION_NAME`.INFORMATION_SCHEMA.VIEW; for example, `myproject`.`region-us`.INFORMATION_SCHEMA.JOBS_TIMELINE_BY_PROJECT.

To run the query:

Console

  1. In the Cloud Console, open the BigQuery page.

    Go to the BigQuery page

  2. In the Query editor box, enter the following standard SQL query. INFORMATION_SCHEMA requires standard SQL syntax. Standard SQL is the default syntax in the Cloud Console.

    SELECT
     period_start,
     SUM(period_slot_ms) AS total_slot_ms,
    FROM
    `region-us`.INFORMATION_SCHEMA.JOBS_TIMELINE_BY_PROJECT
    WHERE
     period_start BETWEEN TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY) AND CURRENT_TIMESTAMP()
    GROUP BY
     period_start
    ORDER BY
     period_start DESC
    
  3. Click Run.

bq

Use the query command and specify standard SQL syntax by using the --nouse_legacy_sql or --use_legacy_sql=false flag. Standard SQL syntax is required for INFORMATION_SCHEMA queries.

To run the query, enter:

bq query --nouse_legacy_sql \
'SELECT
   period_start,
   SUM(period_slot_ms) AS total_slot_ms,
 FROM
  `region-us`.INFORMATION_SCHEMA.JOBS_TIMELINE_BY_PROJECT
 WHERE
   period_start BETWEEN TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY) AND CURRENT_TIMESTAMP()
   AND statement_type != "SCRIPT"
 GROUP BY
   period_start
 ORDER BY
   period_start DESC'

The results should look like the following:

+---------------------+---------------+
|    period_start     | total_slot_ms |
+---------------------+---------------+
| 2020-07-29 03:52:14 |     122415176 |
| 2020-07-29 03:52:15 |     141107048 |
| 2020-07-29 03:52:16 |     173335142 |
| 2020-07-28 03:52:17 |     131107048 |
+---------------------+---------------+
  

You can check usage for a particular reservation with WHERE reservation_id = "…". For script jobs, the parent job also reports the total slot usage from its children jobs. To avoid double counting, use WHERE statement_type != "SCRIPT" to exclude the parent job.

Example 2: Number of RUNNING and PENDING jobs over time

The following example computes the number of RUNNING and PENDING jobs at every second in the last day.

To run the query against a project other than your default project, add the project ID in the following format: `PROJECT_ID`.`region-REGION_NAME`.INFORMATION_SCHEMA.VIEW; for example, `myproject`.`region-us`.INFORMATION_SCHEMA.JOBS_TIMELINE_BY_PROJECT.

To run the query:

Console

  1. In the Cloud Console, open the BigQuery page.

    Go to the BigQuery page

  2. Enter the following standard SQL query in the Query editor box. INFORMATION_SCHEMA requires standard SQL syntax. Standard SQL is the default syntax in the Cloud Console.

    SELECT
     period_start,
     SUM(IF(state = "PENDING", 1, 0)) as PENDING,
     SUM(IF(state = "RUNNING", 1, 0)) as RUNNING
    FROM
     `region-us`.INFORMATION_SCHEMA.JOBS_TIMELINE_BY_PROJECT
    WHERE
     period_start BETWEEN TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY) AND CURRENT_TIMESTAMP()
    GROUP BY period_start
    
    
  3. Click Run.

bq

Use the query command and specify standard SQL syntax by using the --nouse_legacy_sql or --use_legacy_sql=false flag. Standard SQL syntax is required for INFORMATION_SCHEMA queries.

To run the query, enter:

bq query --nouse_legacy_sql \
'SELECT
     period_start,
     SUM(IF(state = "PENDING", 1, 0)) as PENDING,
     SUM(IF(state = "RUNNING", 1, 0)) as RUNNING
   FROM
     `region-us`.INFORMATION_SCHEMA.JOBS_TIMELINE_BY_PROJECT
   WHERE
     period_start BETWEEN TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY) AND CURRENT_TIMESTAMP()
   GROUP BY period_start'

The results should look like the following:

+---------------------+---------+---------+
|    period_start     | PENDING | RUNNING |
+---------------------+---------+---------+
| 2020-07-29 03:52:14 |       7 |      27 |
| 2020-07-29 03:52:15 |       1 |      21 |
| 2020-07-29 03:52:16 |       5 |      21 |
| 2020-07-29 03:52:17 |       4 |      22 |
+---------------------+---------+---------+
  

Example 3: Resource usage by jobs at a specific point in time

The following example returns the job_id of all jobs running at a specific point in time together with their resource usage during that one-second period.

To run the query against a project other than your default project, add the project ID in the following format: `PROJECT_ID`.`region-REGION_NAME`.INFORMATION_SCHEMA.VIEW; for example, `myproject`.`region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT.

To run the query:

Console

  1. In the Cloud Console, open the BigQuery page.

    Go to the BigQuery page

  2. In the Query editor box, enter the following standard SQL query. INFORMATION_SCHEMA requires standard SQL syntax. Standard SQL is the default syntax in the Cloud Console.

    SELECT
     job_id, period_slot_ms
    FROM
      `region-us`.INFORMATION_SCHEMA.JOBS_TIMELINE_BY_PROJECT
    WHERE
     period_start = "2020-07-29 03:52:14"
     AND statement_type != "SCRIPT"
    
    
  3. Click Run.

bq

Use the query command and specify standard SQL syntax by using the --nouse_legacy_sql or --use_legacy_sql=false flag. Standard SQL syntax is required for INFORMATION_SCHEMA queries.

To run the query, enter:

bq query --nouse_legacy_sql \
'SELECT
     job_id, period_slot_ms
   FROM
      `region-us`.INFORMATION_SCHEMA.JOBS_TIMELINE_BY_PROJECT
   WHERE
     period_start = "2020-07-29 03:52:14"

The results should look like the following:

+------------------+
| job_id | slot_ms |
+------------------+
| job_1  | 2415176 |
| job_2  | 4417245 |
| job_3  |  427416 |
| job_4  | 1458122 |
+------------------+
  

Next steps