Stay organized with collections Save and categorize content based on your preferences.

JOBS view

The INFORMATION_SCHEMA.JOBS view contains the real-time metadata about all BigQuery jobs in the current project.

Required role

To get the permission that you need to query the INFORMATION_SCHEMA.JOBS view, ask your administrator to grant you the BigQuery Resource Viewer (roles/bigquery.resourceViewer) IAM role on your project. For more information about granting roles, see Manage access.

This predefined role contains the bigquery.jobs.listAll permission, which is required to query the INFORMATION_SCHEMA.JOBS view. You might also be able to get this permission with custom roles or other predefined roles.

For more information about BigQuery permissions, see Access control with IAM.

Schema

The underlying data is partitioned by the creation_time column and clustered by project_id and user_email.

The INFORMATION_SCHEMA.JOBS view has the following schema:

Column name Data type Value
creation_time TIMESTAMP (Partitioning column) Creation time of this job. Partitioning is based on the UTC time of this timestamp.
project_id STRING (Clustering column) ID of the project.
project_number INTEGER Number of the project.
user_email STRING (Clustering column) Email address or service account of the user who ran the job.
job_id STRING ID of the job. For example, bquxjob_1234.
job_type STRING The type of the job. Can be QUERY, LOAD, EXTRACT, COPY, or null. Job type null indicates an internal job, such as script job statement evaluation or materialized view refresh.
statement_type STRING The type of query statement, if valid. For example, SELECT, INSERT, UPDATE, DELETE, or SCRIPT. See QueryStatementType for list of valid values.
priority STRING The priority of this job. Valid values include INTERACTIVE and BATCH.
start_time TIMESTAMP Start time of this job.
end_time TIMESTAMP End time of this job.
query STRING SQL query text. Note: Only JOBS_BY_PROJECT view has the query column.
state STRING Running state of the job. Valid states include PENDING, RUNNING, and DONE.
reservation_id STRING Name of the primary reservation assigned to this job, if applicable. If your job ran in a project that is assigned to a reservation, it would follow this format: RESERVATION_ADMIN_PROJECT:RESERVATION_LOCATION.RESERVATION_NAME

In this output:

  • RESERVATION_ADMIN_PROJECT: the name of the Google Cloud project that administers the reservation
  • RESERVATION_LOCATION: the location of the reservation
  • RESERVATION_NAME: the name of the reservation
total_bytes_processed INTEGER Total bytes processed by the job.
total_slot_ms INTEGER Slot milliseconds for the job over its entire duration.
error_result RECORD Details of error, if any, as an ErrorProto.
cache_hit BOOLEAN Whether the query results of this job were from a cache.
destination_table RECORD Destination table for results, if any.
referenced_tables RECORD Array of tables referenced by the job. Only populated for query jobs.
labels RECORD Array of labels applied to the job as key, value strings.
timeline RECORD Query timeline of the job. Contains snapshots of query execution.
job_stages RECORD Query stages of the job.
total_bytes_billed INTEGER If the project is configured to use on-demand pricing, then this field contains the total bytes billed for the job. If the project is configured to use flat-rate pricing, then you are not billed for bytes and this field is informational only.
parent_job_id STRING ID of the parent job, if any.
transaction_id STRING ID of the transaction in which this job ran, if any. (Preview)
session_info RECORD Details about the session in which this job ran, if any. (Preview)
dml_statistics RECORD

If the job is a query with a DML statement, the value is a record with the following fields:

  • inserted_row_count: The number of rows that were inserted.
  • deleted_row_count: The number of rows that were deleted.
  • updated_row_count: The number of rows that were updated.

For all other jobs, the value is NULL.

This column is present in the INFORMATION_SCHEMA.JOBS_BY_USER and INFORMATION_SCHEMA.JOBS_BY_PROJECT views.

bi_engine_statistics RECORD If the project is configured to use BI Engine SQL Interface, then this field contains BiEngineStatistics . Otherwise NULL.
total_modified_partitions INTEGER Total number of partitions the job modified. This field is populated for LOAD and QUERY jobs.

Data retention

This view contains currently running jobs and the job history of the past 180 days.

Scope and syntax

Queries against this view must include a region qualifier. If you do not specify a regional qualifier, metadata is retrieved from all regions. The following table explains the region scope for this view:

View name Resource scope Region scope
[PROJECT_ID.]`region-REGION`.INFORMATION_SCHEMA.JOBS[_BY_PROJECT] Project level REGION
Replace the following:

  • Optional: PROJECT_ID: the ID of your Cloud project. If not specified, the default project is used.
  • REGION: any dataset region name. For example, region-us.

When you query INFORMATION_SCHEMA.JOBS to find a summary cost of query jobs, exclude the SCRIPT statement type, otherwise some values might be counted twice. The SCRIPT row includes summary values for all child jobs that were executed as part of this job.

Examples

To run the query against a project other than your default project, add the project ID in the following format:

`PROJECT_ID`.`region-REGION_NAME`.INFORMATION_SCHEMA.JOBS
Replace the following:

  • PROJECT_ID: the ID of the project.
  • REGION_NAME: the region for your project.

For example, `myproject`.`region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT.

The following example calculates average slot utilization for all queries over the past 7 days for a given project. Note that this calculation is most accurate for projects that have consistent slot usage throughout the week. If your project does not have consistent slot usage, this number might be lower than expected.

To run the query:

SELECT
  SUM(total_slot_ms) / (1000 * 60 * 60 * 24 * 7) AS avg_slots
FROM
  `region-us`.INFORMATION_SCHEMA.JOBS
WHERE
  -- Filter by the partition column first to limit the amount of data scanned.
  -- Eight days allows for jobs created before the 7 day end_time filter.
  creation_time BETWEEN TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 8 DAY) AND CURRENT_TIMESTAMP()
  AND job_type = 'QUERY'
  AND end_time BETWEEN TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY) AND CURRENT_TIMESTAMP();

The result is similar to the following:

+------------+
| avg_slots  |
+------------+
| 3879.1534  |
+------------+

You can check usage for a particular reservation with WHERE reservation_id = "…". This can be helpful to determine percentage use of a reservation over a period of time. For script jobs, the parent job also reports the total slot usage from its children jobs. To avoid double counting, use WHERE statement_type != "SCRIPT" to exclude the parent job.

If instead you would like to check the average slot utilization for individual jobs, use total_slot_ms / TIMESTAMP_DIFF(end_time, start_time, MILLISECOND).

Example: Load job history

The following example lists all users or service accounts that submitted a batch load job for a given project. Because no time boundary is specified, this query scans all available history (for example, the last 30 days).

SELECT
  DISTINCT(user_email) AS user
FROM
  `region-us`.INFORMATION_SCHEMA.JOBS
WHERE
  job_type = 'LOAD';

The result is similar to the following:

+--------------+
| user         |
+--------------+
| abc@xyz.com  |
+--------------+
| def@xyz.com  |
+--------------+

Example: Bytes processed per user identity

The following example shows the total bytes billed for query jobs per user.

SELECT
  user_email,
  SUM(total_bytes_billed) AS bytes_billed
FROM
  `region-us.INFORMATION_SCHEMA.JOBS`
WHERE
  job_type = 'QUERY'
  AND statement_type != 'SCRIPT'
GROUP BY
  user_email;

The results should look like the following:

+---------------------+--------------+
| user_email          | bytes_billed |
+---------------------+--------------+
| bob@example.com     | 2847932416   |
| alice@example.com   | 1184890880   |
| charles@example.com | 10485760     |
+---------------------+--------------+

Example: Hourly breakdown of bytes processed

The following example shows total bytes billed for query jobs, in hourly intervals:

SELECT
  TIMESTAMP_TRUNC(end_time, HOUR) AS time_window,
  SUM(total_bytes_billed) AS bytes_billed
FROM
  `region-us`.INFORMATION_SCHEMA.JOBS
WHERE
  job_type = 'QUERY'
  AND statement_type != 'SCRIPT'
GROUP BY
  time_window
ORDER BY
  time_window DESC

The result is similar to the following:

+-------------------------+--------------+
| time_window             | bytes_billed |
+-------------------------+--------------+
| 2022-05-17 20:00:00 UTC | 1967128576   |
| 2022-05-10 21:00:00 UTC | 0            |
| 2022-04-15 20:00:00 UTC | 10485760     |
| 2022-04-15 17:00:00 UTC | 41943040     |
+-------------------------+--------------+

Example: Query jobs per table

The following example shows how many times each table queried in my_project was referenced by a query job:

SELECT
  t.project_id,
  t.dataset_id,
  t.table_id,
  COUNT(*) AS num_references
FROM
  my_project.`region-us`.INFORMATION_SCHEMA.JOBS, UNNEST(referenced_tables) AS t
GROUP BY
  t.project_id
  t.dataset_id,
  t.table_id
ORDER BY
  num_references DESC;

The result is similar to the following:

+------------+------------+----------+----------------+
| project_id | dataset_id | table_id | num_references |
+------------+------------+----------+----------------+
| my_project | dataset1   | orders   | 58             |
| my_project | dataset1   | products | 40             |
| my_project | dataset2   | sales    | 30             |
| other_proj | dataset1   | accounts | 12             |
+------------+------------+----------+----------------+