JOBS view

The INFORMATION_SCHEMA.JOBS view contains near real-time metadata about all BigQuery jobs in the current project.

Required role

To get the permission that you need to query the INFORMATION_SCHEMA.JOBS view, ask your administrator to grant you the BigQuery Resource Viewer (roles/bigquery.resourceViewer) IAM role on your project. For more information about granting roles, see Manage access.

This predefined role contains the bigquery.jobs.listAll permission, which is required to query the INFORMATION_SCHEMA.JOBS view.

You might also be able to get this permission with custom roles or other predefined roles.

For more information about BigQuery permissions, see Access control with IAM.

Schema

The underlying data is partitioned by the creation_time column and clustered by project_id and user_email. The query_info column contains additional information about your query jobs.

The INFORMATION_SCHEMA.JOBS view has the following schema:

Column name Data type Value
bi_engine_statistics RECORD If the project is configured to use the BI Engine SQL Interface, then this field contains BiEngineStatistics. Otherwise NULL.
cache_hit BOOLEAN Whether the query results of this job were from a cache. If you have a multi-query statement job, cache_hit for your parent query is NULL.
creation_time TIMESTAMP (Partitioning column) Creation time of this job. Partitioning is based on the UTC time of this timestamp.
destination_table RECORD Destination table for results, if any.
dml_statistics RECORD If the job is a query with a DML statement, the value is a record with the following fields:
  • inserted_row_count: The number of rows that were inserted.
  • deleted_row_count: The number of rows that were deleted.
  • updated_row_count: The number of rows that were updated.
For all other jobs, the value is NULL.
This column is present in the INFORMATION_SCHEMA.JOBS_BY_USER and INFORMATION_SCHEMA.JOBS_BY_PROJECT views.
end_time TIMESTAMP The end time of this job, in milliseconds since the epoch. This field represents the time when the job enters the DONE state.
error_result RECORD Details of any errors as ErrorProto objects.
job_id STRING The ID of the job. For example, bquxjob_1234.
job_stages RECORD Query stages of the job.
job_type STRING The type of the job. Can be QUERY, LOAD, EXTRACT, COPY, or NULL. A NULL value indicates an internal job, such as a script job statement evaluation or a materialized view refresh.
labels RECORD Array of labels applied to the job as key-value pairs.
parent_job_id STRING ID of the parent job, if any.
priority STRING The priority of this job. Valid values include INTERACTIVE and BATCH.
project_id STRING (Clustering column) The ID of the project.
project_number INTEGER The number of the project.
query STRING SQL query text. Only the JOBS_BY_PROJECT view has the query column.
referenced_tables RECORD Array of tables referenced by the job. Only populated for query jobs.
reservation_id STRING Name of the primary reservation assigned to this job, in the format RESERVATION_ADMIN_PROJECT:RESERVATION_LOCATION.RESERVATION_NAME.
In this output:
  • RESERVATION_ADMIN_PROJECT: the name of the Google Cloud project that administers the reservation
  • RESERVATION_LOCATION: the location of the reservation
  • RESERVATION_NAME: the name of the reservation
session_info RECORD Details about the session in which this job ran, if any. (Preview)
start_time TIMESTAMP The start time of this job, in milliseconds since the epoch. This field represents the time when the job transitions from the PENDING state to either RUNNING or DONE.
state STRING Running state of the job. Valid states include PENDING, RUNNING, and DONE.
statement_type STRING The type of query statement. For example, DELETE, INSERT, SCRIPT, SELECT, or UPDATE. See QueryStatementType for list of valid values.
timeline RECORD Query timeline of the job. Contains snapshots of query execution.
total_bytes_billed INTEGER If the project is configured to use on-demand pricing, then this field contains the total bytes billed for the job. If the project is configured to use flat-rate pricing, then you are not billed for bytes and this field is informational only.

Note: This column's values are empty for queries that read from tables with row-level access policies. For more information, see best practices for row-level security in BigQuery.

total_bytes_processed INTEGER Total bytes processed by the job.
total_modified_partitions INTEGER The total number of partitions the job modified. This field is populated for LOAD and QUERY jobs.
total_slot_ms INTEGER Slot milliseconds for the job over its entire duration in the RUNNING state, including retries.
transaction_id STRING ID of the transaction in which this job ran, if any. (Preview)
user_email STRING (Clustering column) Email address or service account of the user who ran the job.
query_info.resource_warning STRING The warning message that appears if the resource usage during query processing is above the internal threshold of the system.
A successful query job can have the resource_warning field populated. With resource_warning, you get additional data points to optimize your queries and to set up monitoring for performance trends of an equivalent set of queries by using query_hashes.
query_info.query_hashes.normalized_literals STRING Contains the hashes of the query. normalized_literals is a hexadecimal STRING hash that ignores comments, parameter values, UDFs, and literals.
This field appears for successful GoogleSQL queries that are not cache hits.
query_info.performance_insights RECORD Performance insights for the job.
transferred_bytes INTEGER Total bytes transferred for cross-cloud queries, such as BigQuery Omni cross-cloud transfer jobs.
materialized_view_statistics RECORD Statistics of materialized views considered in a query job. (Preview)

When you query INFORMATION_SCHEMA.JOBS to find a summary cost of query jobs, exclude the SCRIPT statement type, otherwise some values might be counted twice. The SCRIPT row includes summary values for all child jobs that were executed as part of this job.

Multi-statement query job

A multi-statement query job is a query job that uses the procedural language. Multi-statement query jobs often define variables with DECLARE or have control flow statements such as IF or WHILE. When you query INFORMATION_SCHEMA.JOBS, you might need to recognize the difference between a multi-statement query job and other jobs. A multi-statement query job has the following traits:

  • statement_type = SCRIPT
  • reservation_id = NULL
  • Child jobs. Each of a multi-statement query job's child jobs has a parent_job_id pointing to the multi-statement query job itself. This includes summary values for all child jobs that were executed as part of this job. For this reason, if you query INFORMATION_SCHEMA.JOBS to find a summary cost of query jobs, then you should exclude the SCRIPT statement type, otherwise some values such as total_slot_ms might be counted twice.

Data retention

This view contains currently running jobs and the job history of the past 180 days.

Scope and syntax

Queries against this view must include a region qualifier. The following table explains the region scope for this view:

View name Resource scope Region scope
[PROJECT_ID.]`region-REGION`.INFORMATION_SCHEMA.JOBS[_BY_PROJECT] Project level REGION
Replace the following:

  • Optional: PROJECT_ID: the ID of your Google Cloud project. If not specified, the default project is used.
  • REGION: any dataset region name. For example, region-us.

Examples

To run the query against a project other than your default project, add the project ID in the following format:

`PROJECT_ID`.`region-REGION_NAME`.INFORMATION_SCHEMA.JOBS
Replace the following:

  • PROJECT_ID: the ID of the project.
  • REGION_NAME: the region for your project.

For example, `myproject`.`region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT.

Calculate average slot utilization

The following example calculates average slot utilization for all queries over the past 7 days for a given project. Note that this calculation is most accurate for projects that have consistent slot usage throughout the week. If your project does not have consistent slot usage, this number might be lower than expected.

To run the query:

SELECT
  SUM(total_slot_ms) / (1000 * 60 * 60 * 24 * 7) AS avg_slots
FROM
  `region-us`.INFORMATION_SCHEMA.JOBS
WHERE
  -- Filter by the partition column first to limit the amount of data scanned.
  -- Eight days allows for jobs created before the 7 day end_time filter.
  creation_time BETWEEN TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 8 DAY) AND CURRENT_TIMESTAMP()
  AND job_type = 'QUERY'
  AND statement_type != 'SCRIPT'
  AND end_time BETWEEN TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY) AND CURRENT_TIMESTAMP();

The result is similar to the following:

+------------+
| avg_slots  |
+------------+
| 3879.1534  |
+------------+

You can check usage for a particular reservation with WHERE reservation_id = "…". This can be helpful to determine percentage use of a reservation over a period of time. For script jobs, the parent job also reports the total slot usage from its children jobs. To avoid double counting, use WHERE statement_type != "SCRIPT" to exclude the parent job.

If instead you would like to check the average slot utilization for individual jobs, use total_slot_ms / TIMESTAMP_DIFF(end_time, start_time, MILLISECOND).

Load job history

The following example lists all users or service accounts that submitted a batch load job for a given project. Because no time boundary is specified, this query scans all available history (for example, the last 30 days).

SELECT
  DISTINCT(user_email) AS user
FROM
  `region-us`.INFORMATION_SCHEMA.JOBS
WHERE
  job_type = 'LOAD';

The result is similar to the following:

+--------------+
| user         |
+--------------+
| abc@xyz.com  |
+--------------+
| def@xyz.com  |
+--------------+

Get the number of load jobs to determine the daily job quota used

The following example returns the number of jobs by day, dataset, and table so that you can determine how much of the daily job quota is used.

SELECT
    DATE(creation_time) as day,
    destination_table.project_id as project_id,
    destination_table.dataset_id as dataset_id,
    destination_table.table_id as table_id,
    COUNT(job_id) AS load_job_count
 FROM
   region-us.INFORMATION_SCHEMA.JOBS_BY_PROJECT
 WHERE
    creation_time BETWEEN TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 8 DAY) AND CURRENT_TIMESTAMP()
    AND job_type = "LOAD"
GROUP BY
    day,
    project_id,
    dataset_id,
    table_id
ORDER BY
    day DESC

Get the last 10 failed jobs

The following example shows the last 10 failed jobs:

SELECT
   job_id,
  creation_time,
  user_email,
   error_result
 FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
 WHERE error_result.reason != "Null"
ORDER BY creation_time DESC
LIMIT 10

The results should look similar to the following:

+---------------+--------------------------+------------------+-----------------------------------------------------------+
| job_id        | creation_time            | user_email       | error_result                                              |
+---------------+--------------------------+------------------+-----------------------------------------------------------+
| examplejob_1  | 2020-10-10 00:00:00 UTC  | bob@example.com  | Column 'generate_metadata_snapshot' has mismatched type...|
| examplejob_2  | 2020-10-11 00:00:00 UTC  | bob@example.com  | Column 'generate_metadata_snapshot' has mismatched type...|
+---------------+--------------------------+------------------+-----------------------------------------------------------+

Query the list of long running jobs

The following example shows the list of long running jobs that are in the RUNNING or PENDING state for more than 30 minutes:

SELECT
  job_id,
  job_type,
  state,
  creation_time,
  start_time,
  user_email
 FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
 WHERE
    state!="DONE" AND
    creation_time <= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 MINUTE)
ORDER BY creation_time ASC;

The result is similar to the following:

+---------------+----------+---------+--------------------------------+--------------------------------+------------------+
| job_id        | job_type | state   | creation_time                  | start_time                     | user_email       |
+---------------+----------+---------+--------------------------------+--------------------------------+------------------+
| examplejob_1  | QUERY    | RUNNING | 2023-05-03 05:07:22.818000 UTC | 2023-05-03 05:07:22.905000 UTC | abc@example.com  |
| examplejob_2  | QUERY    | PENDING | 2023-05-01 02:05:47.925000 UTC | 2023-05-01 02:05:47.998000 UTC | xyz@example.com  |
+---------------+----------+---------+--------------------------------+--------------------------------+------------------+

Bytes processed per user identity

The following example shows the total bytes billed for query jobs per user:

SELECT
  user_email,
  SUM(total_bytes_billed) AS bytes_billed
FROM
  `region-us`.INFORMATION_SCHEMA.JOBS
WHERE
  job_type = 'QUERY'
  AND statement_type != 'SCRIPT'
GROUP BY
  user_email;

Note: Please see the caveat for the total_bytes_billed column in the schema documentation for the JOBS views.

The results should look like the following:

+---------------------+--------------+
| user_email          | bytes_billed |
+---------------------+--------------+
| bob@example.com     | 2847932416   |
| alice@example.com   | 1184890880   |
| charles@example.com | 10485760     |
+---------------------+--------------+

Hourly breakdown of bytes processed

The following example shows total bytes billed for query jobs, in hourly intervals:

SELECT
  TIMESTAMP_TRUNC(end_time, HOUR) AS time_window,
  SUM(total_bytes_billed) AS bytes_billed
FROM
  `region-us`.INFORMATION_SCHEMA.JOBS
WHERE
  job_type = 'QUERY'
  AND statement_type != 'SCRIPT'
GROUP BY
  time_window
ORDER BY
  time_window DESC;

The result is similar to the following:

+-------------------------+--------------+
| time_window             | bytes_billed |
+-------------------------+--------------+
| 2022-05-17 20:00:00 UTC | 1967128576   |
| 2022-05-10 21:00:00 UTC | 0            |
| 2022-04-15 20:00:00 UTC | 10485760     |
| 2022-04-15 17:00:00 UTC | 41943040     |
+-------------------------+--------------+

Query jobs per table

The following example shows how many times each table queried in my_project was referenced by a query job:

SELECT
  t.project_id,
  t.dataset_id,
  t.table_id,
  COUNT(*) AS num_references
FROM
  my_project.`region-us`.INFORMATION_SCHEMA.JOBS, UNNEST(referenced_tables) AS t
GROUP BY
  t.project_id,
  t.dataset_id,
  t.table_id
ORDER BY
  num_references DESC;

The result is similar to the following:

+------------+------------+----------+----------------+
| project_id | dataset_id | table_id | num_references |
+------------+------------+----------+----------------+
| my_project | dataset1   | orders   | 58             |
| my_project | dataset1   | products | 40             |
| my_project | dataset2   | sales    | 30             |
| other_proj | dataset1   | accounts | 12             |
+------------+------------+----------+----------------+

Most expensive queries by project

The following example lists the most expensive queries in my_project by slot usage time:

SELECT
 job_id,
 query,
 user_email,
 total_slot_ms
FROM `my_project`.`region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
WHERE EXTRACT(DATE FROM  creation_time) = current_date()
ORDER BY total_slot_ms DESC
LIMIT 4

You can also list the most expensive queries by data processed with the following example:

SELECT
 job_id,
 query,
 user_email,
 total_bytes_processed
FROM `my_project`.`region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
WHERE EXTRACT(DATE FROM  creation_time) = current_date()
ORDER BY total_bytes_processed DESC
LIMIT 4

The result for either example is similar to the following:

+--------------+---------------------------------+-----------------------+---------------+
| job_id       | query                           | user_email            | total_slot_ms |
+--------------+---------------------------------+--------------------------+------------+
| examplejob_1 | SELECT ... FROM dataset.table1  | bob@example.com       | 80,000        |
| examplejob_2 | SELECT ... FROM dataset.table2  | alice@example.com     | 78,000        |
| examplejob_3 | SELECT ... FROM dataset.table3  | charles@example.com   | 75,000        |
| examplejob_4 | SELECT ... FROM dataset.table4  | tina@example.com      | 72,000        |
+--------------+---------------------------------+-----------------------+---------------+

Get details about a resource warning

If you get a Resources exceeded error message, you can inquire about the queries in a time window:

SELECT
  query,
  query_info.resource_warning
FROM
  `user_project.region-us`.INFORMATION_SCHEMA.JOBS
WHERE
 creation_time BETWEEN TIMESTAMP("2022-12-01")
 AND TIMESTAMP("2022-12-08")
 AND query_info.resource_warning IS NOT NULL
LIMIT
 50;

Monitor resource warnings grouped by date

If you get a Resources exceeded error message, you can monitor the total number of resource warnings grouped by date to know if there are any changes to workload:

WITH resource_warnings AS (
  SELECT
    EXTRACT(DATE FROM creation_time) AS creation_date
  FROM
    `user_project.region-us`.INFORMATION_SCHEMA.JOBS
  WHERE
    creation_time >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 14 DAY)
    AND query_info.resource_warning IS NOT NULL
)
SELECT
  creation_date,
  COUNT(1) AS warning_counts
FROM
  resource_warnings
GROUP BY creation_date
ORDER BY creation_date DESC;

Estimate slot usage and cost for queries

The following example computes the average slots and max slots for each job by using estimated_runnable_units.

The reservation_id is NULL if you don't have any reservations.

SELECT
  project_id,
  job_id,
  reservation_id,
  EXTRACT(DATE FROM creation_time) AS creation_date,
  TIMESTAMP_DIFF(end_time, start_time, SECOND) AS job_duration_seconds,
  job_type,
  user_email,
  total_bytes_billed,

  -- Average slot utilization per job is calculated by dividing total_slot_ms by the millisecond duration of the job

  SAFE_DIVIDE(job.total_slot_ms,(TIMESTAMP_DIFF(job.end_time, job.start_time, MILLISECOND))) AS job_avg_slots,
  query,

  -- Determine the max number of slots used at ANY stage in the query.
  -- The average slots might be 55. But a single stage might spike to 2000 slots.
  -- This is important to know when estimating number of slots to purchase.

  MAX(SAFE_DIVIDE(unnest_job_stages.slot_ms,unnest_job_stages.end_ms - unnest_job_stages.start_ms)) AS jobstage_max_slots,

  -- Check if there's a job that requests more units of works (slots). If so you need more slots.
  -- estimated_runnable_units = Units of work that can be scheduled immediately.
  -- Providing additional slots for these units of work accelerates the query,
  -- if no other query in the reservation needs additional slots.

  MAX(unnest_timeline.estimated_runnable_units) AS estimated_runnable_units
FROM `region-us`.INFORMATION_SCHEMA.JOBS AS job
  CROSS JOIN UNNEST(job_stages) as unnest_job_stages
  CROSS JOIN UNNEST(timeline) AS unnest_timeline
WHERE project_id = 'my_project'
  AND statement_type != 'SCRIPT'
  AND DATE(creation_time) BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY) AND CURRENT_DATE()
GROUP BY 1,2,3,4,5,6,7,8,9,10
ORDER BY job_id;

View performance insights for queries

The following example returns all query jobs that have performance insights from your project in the last 30 days, along with a URL that links to the query execution graph in the Google Cloud console.

SELECT
  `bigquery-public-data`.persistent_udfs.job_url(
    project_id || ':us.' || job_id) AS job_url,
  query_info.performance_insights
FROM
  `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
WHERE
  DATE(creation_time) >= CURRENT_DATE - 30 -- scan 30 days of query history
  AND job_type = 'QUERY'
  AND state = 'DONE'
  AND error_result IS NULL
  AND statement_type != 'SCRIPT'
  AND EXISTS ( -- Only include queries which had performance insights
    SELECT 1
    FROM UNNEST(
      query_info.performance_insights.stage_performance_standalone_insights
    )
    WHERE slot_contention OR insufficient_shuffle_quota
    UNION ALL
    SELECT 1
    FROM UNNEST(
      query_info.performance_insights.stage_performance_change_insights
    )
    WHERE input_data_change.records_read_diff_percentage IS NOT NULL
  );

View metadata refresh jobs

The following example lists the metadata refresh jobs:

SELECT
 *
FROM
 `region-aws-us-east-1.INFORMATION_SCHEMA.JOBS_BY_PROJECT`
WHERE
 job_id LIKE '%metadata_cache_refresh%'
 AND creation_time > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 6 HOUR)
ORDER BY start_time desc
LIMIT 10;

Analyze performance over time for identical queries

The following example returns the top 10 slowest jobs over the past 7 days that have run the same query:

DECLARE querytext STRING DEFAULT(
  SELECT query
  FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
  WHERE job_id = 'JOB_ID'
  LIMIT 1
);

SELECT
  start_time,
  end_time,
  project_id,
  job_id,
  TIMESTAMP_DIFF(end_time, start_time, SECOND) AS run_secs,
  total_bytes_processed / POW(1024, 3) AS total_gigabytes_processed,
  query
FROM
  `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
WHERE
  query = querytext
  AND total_bytes_processed > 0
  AND creation_time >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY)
ORDER BY 5 DESC
LIMIT 10;

Replace JOB_ID with any job_id that ran the query you are analyzing.