Getting job metadata by timeslice using INFORMATION_SCHEMA

INFORMATION_SCHEMA is a series of views that provide access to metadata about datasets, routines, tables, views, jobs, reservations, and streaming data.

You can query the INFORMATION_SCHEMA.JOBS_TIMELINE_BY_* views to retrieve real-time BigQuery metadata by timeslice. This view contains currently running and completed jobs. Data is retained for 180 days.

Before you begin

Grant Identity and Access Management (IAM) roles that give users the necessary permissions to perform each task in this document.

Required permissions

To retrieve jobs metadata by using INFORMATION_SCHEMA tables, you need the following IAM permissions:

  • For JOBS_TIMELINE_BY_USER, you need the following:

    • bigquery.jobs.list for the project

    Each of the following predefined IAM roles includes the preceding permissions:

    • Project Viewer
    • BigQuery User
  • For JOBS_TIMELINE_BY_PROJECT, you need the following:

    • bigquery.jobs.listAll for the project

    Each of the following predefined IAM roles includes the preceding permissions:

    • Project Owner
    • BigQuery Admin
  • For JOBS_TIMELINE_BY_FOLDER, you need the following:

    • bigquery.jobs.listAll for the parent folder

    Each of the following predefined IAM roles includes the preceding permissions:

    • Folder Admin
    • BigQuery Admin
  • For JOBS_TIMELINE_BY_ORGANIZATION, you need the following:

    • bigquery.jobs.listAll for the organization

    Each of the following predefined IAM roles includes the preceding permissions:

    • Organization bigquery.resourceAdmin
    • Organization Owner
    • Organization Admin

      The JOBS_BY_ORGANIZATION schema table is only available to users with defined Google Cloud organizations.

For more information about granular BigQuery permissions, see roles and permissions.

Schema

When you query the INFORMATION_SCHEMA.JOBS_TIMELINE_BY_* views, the query results contain one row for every second of execution of every BigQuery job. Each period starts on a whole-second interval and lasts exactly one second.

  • INFORMATION_SCHEMA.JOBS_TIMELINE_BY_USER returns only the jobs submitted by the current user in the current project.
  • INFORMATION_SCHEMA.JOBS_TIMELINE_BY_PROJECT returns all jobs submitted in the current project.
  • INFORMATION_SCHEMA.JOBS_TIMELINE_BY_FOLDER returns all jobs submitted in the parent folder of the current project, including the jobs in subfolders under it.
  • INFORMATION_SCHEMA.JOBS_TIMELINE_BY_ORGANIZATION returns all jobs submitted in the parent folder of the current project, including the jobs in subfolders under it.

The INFORMATION_SCHEMA.JOBS_TIMELINE_BY_* view has the following schema:

Column name Data type Value
period_start TIMESTAMP Start time of this period.
period_slot_ms INTEGER Slot milliseconds consumed in this period.
project_id STRING (Clustering column) ID of the project.
project_number INTEGER Number of the project.
user_email STRING (Clustering column) Email address or service account of the user who ran the job.
job_id STRING ID of the job. For example, bquxjob_1234.
job_type STRING The type of the job. Can be QUERY, LOAD, EXTRACT, COPY, or null. Job type null indicates an internal job, such as script job statement evaluation or materialized view refresh.
statement_type STRING The type of query statement, if valid. For example, SELECT, INSERT, UPDATE, or DELETE.
job_creation_time TIMESTAMP (Partitioning column) Creation time of this job. Partitioning is based on the UTC time of this timestamp.
job_start_time TIMESTAMP Start time of this job.
job_end_time TIMESTAMP End time of this job.
state STRING Running state of the job at the end of this period. Valid states include PENDING, RUNNING, and DONE.
reservation_id STRING Name of the primary reservation assigned to this job at the end of this period, if applicable.
total_bytes_processed INTEGER Total bytes processed by the job.
error_result RECORD Details of error (if any) as an ErrorProto.
cache_hit BOOLEAN Whether the query results of this job were from a cache.

Data retention

Data in the INFORMATION_SCHEMA jobs timeline views will be retained for 180 days.

Regionality

BigQuery's INFORMATION_SCHEMA jobs timeline views are regionalized. To query these views, you must use a region qualifier.

Examples

Example 1: Time series of slot utilization

The following example calculates the slot utilization for every second in the last day.

To run the query against a project other than your default project, add the project ID in the following format: `PROJECT_ID`.`region-REGION_NAME`.INFORMATION_SCHEMA.VIEW; for example, `myproject`.`region-us`.INFORMATION_SCHEMA.JOBS_TIMELINE_BY_PROJECT.

To run the query:

Console

  1. In the Cloud console, open the BigQuery page.

    Go to the BigQuery page

  2. In the Query editor box, enter the following standard SQL query. INFORMATION_SCHEMA requires standard SQL syntax. Standard SQL is the default syntax in the Cloud console.

    SELECT
     period_start,
     SUM(period_slot_ms) AS total_slot_ms,
    FROM
    `reservation-admin-project.region-us`.INFORMATION_SCHEMA.JOBS_TIMELINE_BY_PROJECT
    WHERE
     period_start BETWEEN TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY) AND CURRENT_TIMESTAMP()
    GROUP BY
     period_start
    ORDER BY
     period_start DESC
    
  3. Click Run.

bq

Use the query command and specify standard SQL syntax by using the --nouse_legacy_sql or --use_legacy_sql=false flag. Standard SQL syntax is required for INFORMATION_SCHEMA queries.

To run the query, enter:

bq query --nouse_legacy_sql \
'SELECT
   period_start,
   SUM(period_slot_ms) AS total_slot_ms,
 FROM
  `reservation-admin-project.region-us`.INFORMATION_SCHEMA.JOBS_TIMELINE_BY_PROJECT
 WHERE
   period_start BETWEEN TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY) AND CURRENT_TIMESTAMP()
   AND statement_type != "SCRIPT"
 GROUP BY
   period_start
 ORDER BY
   period_start DESC'

The results should look like the following:

+---------------------+---------------+
|    period_start     | total_slot_ms |
+---------------------+---------------+
| 2020-07-29 03:52:14 |     122415176 |
| 2020-07-29 03:52:15 |     141107048 |
| 2020-07-29 03:52:16 |     173335142 |
| 2020-07-28 03:52:17 |     131107048 |
+---------------------+---------------+
  

You can check usage for a particular reservation with WHERE reservation_id = "…". For script jobs, the parent job also reports the total slot usage from its children jobs. To avoid double counting, use WHERE statement_type != "SCRIPT" to exclude the parent job.

Example 2: Number of RUNNING and PENDING jobs over time

The following example computes the number of RUNNING and PENDING jobs at every second in the last day.

To run the query against a project other than your default project, add the project ID in the following format: `PROJECT_ID`.`region-REGION_NAME`.INFORMATION_SCHEMA.VIEW; for example, `myproject`.`region-us`.INFORMATION_SCHEMA.JOBS_TIMELINE_BY_PROJECT.

To run the query:

Console

  1. In the Cloud console, open the BigQuery page.

    Go to the BigQuery page

  2. Enter the following standard SQL query in the Query editor box. INFORMATION_SCHEMA requires standard SQL syntax. Standard SQL is the default syntax in the Cloud console.

    SELECT
     period_start,
     SUM(IF(state = "PENDING", 1, 0)) as PENDING,
     SUM(IF(state = "RUNNING", 1, 0)) as RUNNING
    FROM
     `reservation-admin-project.region-us`.INFORMATION_SCHEMA.JOBS_TIMELINE_BY_PROJECT
    WHERE
     period_start BETWEEN TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY) AND CURRENT_TIMESTAMP()
    GROUP BY period_start
    
    
  3. Click Run.

bq

Use the query command and specify standard SQL syntax by using the --nouse_legacy_sql or --use_legacy_sql=false flag. Standard SQL syntax is required for INFORMATION_SCHEMA queries.

To run the query, enter:

bq query --nouse_legacy_sql \
'SELECT
     period_start,
     SUM(IF(state = "PENDING", 1, 0)) as PENDING,
     SUM(IF(state = "RUNNING", 1, 0)) as RUNNING
   FROM
     `reservation-admin-project.region-us`.INFORMATION_SCHEMA.JOBS_TIMELINE_BY_PROJECT
   WHERE
     period_start BETWEEN TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY) AND CURRENT_TIMESTAMP()
   GROUP BY period_start'

The results should look like the following:

+---------------------+---------+---------+
|    period_start     | PENDING | RUNNING |
+---------------------+---------+---------+
| 2020-07-29 03:52:14 |       7 |      27 |
| 2020-07-29 03:52:15 |       1 |      21 |
| 2020-07-29 03:52:16 |       5 |      21 |
| 2020-07-29 03:52:17 |       4 |      22 |
+---------------------+---------+---------+
  

Example 3: Resource usage by jobs at a specific point in time

The following example returns the job_id of all jobs running at a specific point in time together with their resource usage during that one-second period.

To run the query against a project other than your default project, add the project ID in the following format: `PROJECT_ID`.`region-REGION_NAME`.INFORMATION_SCHEMA.VIEW; for example, `myproject`.`region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT.

To run the query:

Console

  1. In the Cloud console, open the BigQuery page.

    Go to the BigQuery page

  2. In the Query editor box, enter the following standard SQL query. INFORMATION_SCHEMA requires standard SQL syntax. Standard SQL is the default syntax in the Cloud console.

    SELECT
     job_id, period_slot_ms
    FROM
      `reservation-admin-project.region-us`.INFORMATION_SCHEMA.JOBS_TIMELINE_BY_PROJECT
    WHERE
     period_start = "2020-07-29 03:52:14"
     AND statement_type != "SCRIPT"
    
    
  3. Click Run.

bq

Use the query command and specify standard SQL syntax by using the --nouse_legacy_sql or --use_legacy_sql=false flag. Standard SQL syntax is required for INFORMATION_SCHEMA queries.

To run the query, enter:

bq query --nouse_legacy_sql \
'SELECT
     job_id, period_slot_ms
   FROM
      `reservation-admin-project.region-us`.INFORMATION_SCHEMA.JOBS_TIMELINE_BY_PROJECT
   WHERE
     period_start = "2020-07-29 03:52:14"

The results should look like the following:

+------------------+
| job_id | slot_ms |
+------------------+
| job_1  | 2415176 |
| job_2  | 4417245 |
| job_3  |  427416 |
| job_4  | 1458122 |
+------------------+
  

Get slot usage compared to slot capacity

Reservation information can be combined with the jobs timeline by using the reservation timeline view. The reservation timeline view shows per-minute timeslices of reservation metadata for each reservation admin project.

Reservations Timeline Schema

When you query the INFORMATION_SCHEMA.RESERVATIONS_TIMELINE_BY_* views, the query results contain one row for every minute of every BigQuery reservation. Each period starts on a whole-minute interval and lasts exactly one minute.

The INFORMATION_SCHEMA.RESERVATIONS_TIMELINE_BY_PROJECT view has the following schema:

Column name Data type Value
period_start TIMESTAMP Start time of this one-minute period.
project_id STRING ID of the reservation admin project.
project_number INTEGER Number of the project.
reservation_name STRING The name of the reservation.
ignore_idle_slots BOOLEAN False if slot sharing is enabled, otherwise true.
slots_assigned INTEGER The number of slots assigned to this reservation.
slots_max_assigned INTEGER The maximum slot capacity for this reservation, including slot sharing. If ignore_idle_slots is true, this is the same as slots_assigned, otherwise this is the total number of slots in all capacity commitments in the admin project.
reservation_id STRING For joining with the jobs_timeline table. This is of the form project_id:location.reservation_name.

Examples

Example 1: See total slot usage per minute

The following example shows per-minute slot usage from projects assigned to YOUR_RESERVATION_ID across all jobs.

Console

  1. In the Cloud console, open the BigQuery page.

    Go to the BigQuery page

  2. In the Query editor box, enter the following standard SQL query. INFORMATION_SCHEMA requires standard SQL syntax. Standard SQL is the default syntax in the Cloud console.

    SELECT
     res.period_start,
     SUM(jobs.period_slot_ms) / 1000 / 60 AS period_slot_minutes,
     ANY_VALUE(res.slots_assigned) AS slot_assigned,
     ANY_VALUE(res.slots_max_assigned) AS slots_max_assigned
    FROM
     `region-us`.INFORMATION_SCHEMA.JOBS_TIMELINE_BY_ORGANIZATION jobs
    JOIN
     `region-us`.INFORMATION_SCHEMA.RESERVATIONS_TIMELINE_BY_PROJECT res
    ON TIMESTAMP_TRUNC(jobs.period_start, MINUTE) = res.period_start AND jobs.reservation_id = res.reservation_id
    WHERE jobs.job_creation_time BETWEEN TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY) AND CURRENT_TIMESTAMP()
    AND res.reservation_id = 'YOUR_RESERVATION_ID'
    GROUP BY period_start
    ORDER BY period_start DESC
    
  3. click Run.

bq

Use the bq query command and specify standard SQL syntax by using the --nouse_legacy_sql or --use_legacy_sql=false flag. Standard SQL syntax is required for INFORMATION_SCHEMA queries.

To run the query, enter:

bq query --use_legacy_sql=false \
'SELECT
  res.period_start,
  SUM(jobs.period_slot_ms) / 1000 / 60 AS period_slot_minutes,
  ANY_VALUE(res.slots_assigned) AS slot_assigned,
  ANY_VALUE(res.slots_max_assigned) AS slots_max_assigned
FROM
  `region-us`.INFORMATION_SCHEMA.JOBS_TIMELINE_BY_ORGANIZATION jobs
JOIN
  `region-us`.INFORMATION_SCHEMA.RESERVATIONS_TIMELINE_BY_PROJECT res
ON TIMESTAMP_TRUNC(jobs.period_start, MINUTE) = res.period_start AND jobs.reservation_id = res.reservation_id
WHERE jobs.job_creation_time BETWEEN TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY) AND CURRENT_TIMESTAMP()
AND res.reservation_id = 'YOUR_RESERVATION_ID'
GROUP BY period_start
ORDER BY period_start DESC'

The results should look similar to the following:

  +-----------------------+---------------------+---------------+-------------------+
  |     period_start      | period_slot_minutes | slots_assigned| slots_max_assigned|
  +-----------------------+---------------------+---------------+-------------------+
  |2021-06-08 21:33:00 UTC|       100.000       |      100      |        100        |
  |2021-06-08 21:32:00 UTC|        96.753       |      100      |        100        |
  |2021-06-08 21:31:00 UTC|        41.668       |      100      |        100        |
  +-----------------------+---------------------+---------------+-------------------+
  

Example 2: Slot usage by reservation

The following example shows per-minute slot usage for each reservation in the last day.

Console

  1. In the Cloud console, open the BigQuery page.

    Go to the BigQuery page

  2. In the Query editor box, enter the following standard SQL query. INFORMATION_SCHEMA requires standard SQL syntax. Standard SQL is the default syntax in the Cloud console.

    SELECT
     res.period_start,
     res.reservation_id,
     SUM(jobs.period_slot_ms) / 1000 / 60 AS period_slot_minutes,
     ANY_VALUE(res.slots_assigned) AS slots_assigned,
     ANY_VALUE(res.slots_max_assigned) AS slots_max_assigned,
    FROM
     `region-us`.INFORMATION_SCHEMA.JOBS_TIMELINE_BY_ORGANIZATION jobs
    JOIN
     `region-us`.INFORMATION_SCHEMA.RESERVATIONS_TIMELINE_BY_PROJECT res
    ON
     TIMESTAMP_TRUNC(jobs.period_start, MINUTE) = res.period_start
     AND jobs.reservation_id = res.reservation_id
    WHERE
     jobs.job_creation_time BETWEEN TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY)
     AND CURRENT_TIMESTAMP()
    GROUP BY
     period_start,
     reservation_id
    ORDER BY
     period_start DESC, reservation_id
    
  3. click Run.

bq

Use the bq query command and specify standard SQL syntax by using the --nouse_legacy_sql or --use_legacy_sql=false flag. Standard SQL syntax is required for INFORMATION_SCHEMA queries.

To run the query, enter:

bq query --use_legacy_sql=false \
'SELECT
  res.period_start,
  res.reservation_id,
  SUM(jobs.period_slot_ms) / 1000 / 60 AS period_slot_minutes,
  ANY_VALUE(res.slots_assigned) AS slots_assigned,
  ANY_VALUE(res.slots_max_assigned) AS slots_max_assigned,
FROM
  `region-us`.INFORMATION_SCHEMA.JOBS_TIMELINE_BY_PROJECT jobs
JOIN
  `region-us`.INFORMATION_SCHEMA.RESERVATIONS_TIMELINE_BY_PROJECT res
ON
  TIMESTAMP_TRUNC(jobs.period_start, MINUTE) = res.period_start
  AND jobs.reservation_id = res.reservation_id
WHERE
  jobs.job_creation_time BETWEEN TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY)
  AND CURRENT_TIMESTAMP()
GROUP BY
  period_start,
  reservation_id
ORDER BY
  period_start DESC, reservation_id'

The results should look similar to the following:

  +-----------------------+----------------+---------------------+---------------+--------------------+
  |     period_start      | reservation_id | period_slot_minutes | slot_assigned | slots_max_assigned |
  +-----------------------+----------------+---------------------+---------------+--------------------+
  |2021-06-08 21:33:00 UTC|     prod01     |       100.000       |      100      |        100         |
  |2021-06-08 21:33:00 UTC|     prod02     |       177.201       |      200      |        500         |
  |2021-06-08 21:32:00 UTC|     prod01     |        96.753       |      100      |        100         |
  |2021-06-08 21:32:00 UTC|     prod02     |       182.329       |      200      |        500         |
  +-----------------------+----------------+---------------------+---------------+--------------------+
  

What's next