JOBS_BY_FOLDER view

The INFORMATION_SCHEMA.JOBS_BY_FOLDER view contains near real-time metadata about all jobs submitted in the parent folder of the current project, including the jobs in subfolders under it.

Required role

To get the permission that you need to query the INFORMATION_SCHEMA.JOBS_BY_FOLDER view, ask your administrator to grant you the BigQuery Resource Viewer (roles/bigquery.resourceViewer) IAM role on your parent folder. For more information about granting roles, see Manage access.

This predefined role contains the bigquery.jobs.listAll permission, which is required to query the INFORMATION_SCHEMA.JOBS_BY_FOLDER view.

You might also be able to get this permission with custom roles or other predefined roles.

For more information about BigQuery permissions, see Access control with IAM.

Schema

The underlying data is partitioned by the creation_time column and clustered by project_id and user_email. The query_info column contains additional information about your query jobs.

The INFORMATION_SCHEMA.JOBS_BY_FOLDER view has the following schema:

Column name	Data type	Value
`bi_engine_statistics`	`RECORD`	If the project is configured to use the BI Engine SQL Interface, then this field contains BiEngineStatistics. Otherwise `NULL`.
`cache_hit`	`BOOLEAN`	Whether the query results of this job were from a cache. If you have a multi-query statement job, `cache_hit` for your parent query is `NULL`.
`creation_time`	`TIMESTAMP`	(Partitioning column) Creation time of this job. Partitioning is based on the UTC time of this timestamp.
`destination_table`	`RECORD`	Destination table for results, if any.
`end_time`	`TIMESTAMP`	The end time of this job, in milliseconds since the epoch. This field represents the time when the job enters the `DONE` state.
`error_result`	`RECORD`	Details of any errors as ErrorProto objects.
`folder_numbers`	`REPEATED INTEGER`	Number IDs of folders that contain the project, starting with the folder that immediately contains the project, followed by the folder that contains the child folder, and so forth. For example, if `folder_numbers` is `[1, 2, 3]`, then folder `1` immediately contains the project, folder `2` contains `1`, and folder `3` contains `2`. This column is only populated in `JOBS_BY_FOLDER`.
`job_id`	`STRING`	The ID of the job. For example, `bquxjob_1234`.
`job_stages`	`RECORD`	Query stages of the job. Note: This column's values are empty for queries that read from tables with row-level access policies. For more information, see best practices for row-level security in BigQuery.
`job_type`	`STRING`	The type of the job. Can be `QUERY`, `LOAD`, `EXTRACT`, `COPY`, or `NULL`. A `NULL` value indicates an internal job, such as a script job statement evaluation or a materialized view refresh.
`labels`	`RECORD`	Array of labels applied to the job as key-value pairs.
`parent_job_id`	`STRING`	ID of the parent job, if any.
`priority`	`STRING`	The priority of this job. Valid values include `INTERACTIVE` and `BATCH`.
`project_id`	`STRING`	(Clustering column) The ID of the project.
`project_number`	`INTEGER`	The number of the project.
`query`	`STRING`	SQL query text. Only the `JOBS_BY_PROJECT` view has the query column.
`referenced_tables`	`RECORD`	Array of tables referenced by the job. Only populated for query jobs that are not cache hits.
`reservation_id`	`STRING`	Name of the primary reservation assigned to this job, in the format `RESERVATION_ADMIN_PROJECT:RESERVATION_LOCATION.RESERVATION_NAME`. In this output: `RESERVATION_ADMIN_PROJECT`: the name of the Google Cloud project that administers the reservation `RESERVATION_LOCATION`: the location of the reservation `RESERVATION_NAME`: the name of the reservation
`session_info`	`RECORD`	Details about the session in which this job ran, if any. (Preview)
`start_time`	`TIMESTAMP`	The start time of this job, in milliseconds since the epoch. This field represents the time when the job transitions from the `PENDING` state to either `RUNNING` or `DONE`.
`state`	`STRING`	Running state of the job. Valid states include `PENDING`, `RUNNING`, and `DONE`.
`statement_type`	`STRING`	The type of query statement. For example, `DELETE`, `INSERT`, `SCRIPT`, `SELECT`, or `UPDATE`. See QueryStatementType for list of valid values.
`timeline`	`RECORD`	Query timeline of the job. Contains snapshots of query execution.
`total_bytes_billed`	`INTEGER`	If the project is configured to use on-demand pricing, then this field contains the total bytes billed for the job. If the project is configured to use flat-rate pricing, then you are not billed for bytes and this field is informational only. Note: This column's values are empty for queries that read from tables with row-level access policies. For more information, see best practices for row-level security in BigQuery.
`total_bytes_processed`	`INTEGER`	Total bytes processed by the job. Note: This column's values are empty for queries that read from tables with row-level access policies. For more information, see best practices for row-level security in BigQuery.
`total_modified_partitions`	`INTEGER`	The total number of partitions the job modified. This field is populated for `LOAD` and `QUERY` jobs.
`total_slot_ms`	`INTEGER`	Slot milliseconds for the job over its entire duration in the `RUNNING` state, including retries.
`transaction_id`	`STRING`	ID of the transaction in which this job ran, if any. (Preview)
`user_email`	`STRING`	(Clustering column) Email address or service account of the user who ran the job.
`query_info.resource_warning`	`STRING`	The warning message that appears if the resource usage during query processing is above the internal threshold of the system. A successful query job can have the `resource_warning` field populated. With `resource_warning`, you get additional data points to optimize your queries and to set up monitoring for performance trends of an equivalent set of queries by using `query_hashes`.
`query_info.query_hashes.normalized_literals`	`STRING`	Contains the hashes of the query. `normalized_literals` is a hexadecimal `STRING` hash that ignores comments, parameter values, UDFs, and literals. This field appears for successful GoogleSQL queries that are not cache hits.
`query_info.performance_insights`	`RECORD`	Performance insights for the job.
`query_info.optimization_details`	`STRUCT`	The history-based optimizations for the job.
`transferred_bytes`	`INTEGER`	Total bytes transferred for cross-cloud queries, such as BigQuery Omni cross-cloud transfer jobs.
`materialized_view_statistics`	`RECORD`	Statistics of materialized views considered in a query job. (Preview)

Data retention

This view contains currently running jobs and the job history of the past 180 days.

Scope and syntax

Queries against this view must include a region qualifier. The following table explains the region scope for this view:

View name	Resource scope	Region scope
[PROJECT_ID.]`region-REGION`.INFORMATION_SCHEMA.JOBS_BY_FOLDER	Folder that contains the specified project	`REGION`

Replace the following:

Optional: PROJECT_ID: the ID of your Google Cloud project. If not specified, the default project is used.
REGION: any dataset region name. For example, region-us.

Note: You must specify the location to query the INFORMATION_SCHEMA views and the location of the query execution must match the region of the INFORMATION_SCHEMA view.

Example

The following query displays the job ID, creation time, and state (PENDING, RUNNING, or DONE) of all interactive jobs in the designated project's folder:

SELECT
  job_id,
  creation_time,
  state
FROM
  `region-us`.INFORMATION_SCHEMA.JOBS_BY_FOLDER
WHERE
  priority = 'INTERACTIVE';

The result is similar to the following:

+--------------+---------------------------+---------------------------------+
| job_id       |  creation_time            |  state                          |
+--------------+---------------------------+---------------------------------+
| bquxjob_1    |  2019-10-10 00:00:00 UTC  |  DONE                           |
| bquxjob_2    |  2019-10-10 00:00:01 UTC  |  DONE                           |
| bquxjob_3    |  2019-10-10 00:00:02 UTC  |  DONE                           |
| bquxjob_4    |  2019-10-10 00:00:03 UTC  |  RUNNING                        |
| bquxjob_5    |  2019-10-10 00:00:04 UTC  |  PENDING                        |
+--------------+---------------------------+---------------------------------+

Get bytes processed by export jobs

The following example computes the total_processed_bytes value for EXTRACT job types. For information about quotas for export jobs, see Quota policy for export jobs. The total bytes processed can be used to monitor the aggregate usage and ensure that export jobs stays below the 50 TB per-day limit:

SELECT
    DATE(creation_time) as day,
    project_id as source_project_id,
    SUM(total_bytes_processed) AS total_bytes_processed
 FROM
   region-us.INFORMATION_SCHEMA.JOBS_BY_PROJECT
 WHERE
    creation_time BETWEEN TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 8 DAY) AND CURRENT_TIMESTAMP()
    AND job_type = "EXTRACT"
GROUP BY
    day,
    source_project_id
ORDER BY
    day DESC

Get usage of copy jobs

For information about copy jobs, see Copy a table. The following example provides the usage of copy jobs:

SELECT
    DATE(creation_time) as day,
    project_id as source_project_id,
CONCAT(destination_table.project_id,":",destination_table.dataset_id,".",destination_table.table_id) as destination_table,
    COUNT(job_id) AS copy_job_count
 FROM
   region-us.INFORMATION_SCHEMA.JOBS_BY_PROJECT
 WHERE
    creation_time BETWEEN TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 8 DAY) AND CURRENT_TIMESTAMP()
    AND job_type = "COPY"
GROUP BY
    day,
    source_project_id,
    destination_table
ORDER BY
    day DESC

View performance insights for queries

The following example returns all query jobs that have performance insights from the designated project's folder in the last 30 days, along with a URL that links to the query execution graph in the Google Cloud console.

SELECT
  `bigquery-public-data`.persistent_udfs.job_url(
    project_id || ':us.' || job_id) AS job_url,
  query_info.performance_insights
FROM
  `region-us`.INFORMATION_SCHEMA.JOBS_BY_FOLDER
WHERE
  DATE(creation_time) >= CURRENT_DATE - 30 -- scan 30 days of query history
  AND job_type = 'QUERY'
  AND state = 'DONE'
  AND error_result IS NULL
  AND statement_type != 'SCRIPT'
  AND EXISTS ( -- Only include queries which had performance insights
    SELECT 1
    FROM UNNEST(
      query_info.performance_insights.stage_performance_standalone_insights
    )
    WHERE slot_contention OR insufficient_shuffle_quota
    UNION ALL
    SELECT 1
    FROM UNNEST(
      query_info.performance_insights.stage_performance_change_insights
    )
    WHERE input_data_change.records_read_diff_percentage IS NOT NULL
  );