BigQuery API - Class Google::Cloud::Bigquery::QueryJob (v1.40.0)

Reference documentation and code samples for the BigQuery API class Google::Cloud::Bigquery::QueryJob.

QueryJob

A Job subclass representing a query operation that may be performed on a Table. A QueryJob instance is created when you call Project#query_job, Dataset#query_job.

Inherits

Google::Cloud::Bigquery::Job

Examples

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

job = bigquery.query_job "SELECT COUNT(word) as count FROM " \
                         "`bigquery-public-data.samples.shakespeare`"

job.wait_until_done!

if job.failed?
  puts job.error
else
  puts job.data.first
end

With multiple statements and child jobs:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

multi_statement_sql = <<~SQL
  -- Declare a variable to hold names as an array.
  DECLARE top_names ARRAY<STRING>;
  -- Build an array of the top 100 names from the year 2017.
  SET top_names = (
  SELECT ARRAY_AGG(name ORDER BY number DESC LIMIT 100)
  FROM `bigquery-public-data.usa_names.usa_1910_current`
  WHERE year = 2017
  );
  -- Which names appear as words in Shakespeare's plays?
  SELECT
  name AS shakespeare_name
  FROM UNNEST(top_names) AS name
  WHERE name IN (
  SELECT word
  FROM `bigquery-public-data.samples.shakespeare`
  );
SQL

job = bigquery.query_job multi_statement_sql

job.wait_until_done!

child_jobs = bigquery.jobs parent_job: job

child_jobs.each do |child_job|
  script_statistics = child_job.script_statistics
  puts script_statistics.evaluation_kind
  script_statistics.stack_frames.each do |stack_frame|
    puts stack_frame.text
  end
end

Methods

#batch?

def batch?() -> Boolean

Checks if the priority for the query is BATCH.

Returns

(Boolean) — true when the priority is BATCH, false otherwise.

#bytes_processed

def bytes_processed() -> Integer, nil

The number of bytes processed by the query.

Returns

(Integer, nil) — Total bytes processed for the job.

#cache?

def cache?() -> Boolean

Checks if the query job looks for an existing result in the query cache. For more information, see Query Caching.

Returns

(Boolean) — true when the query cache will be used, false otherwise.

#cache_hit?

def cache_hit?() -> Boolean

Checks if the query results are from the query cache.

Returns

(Boolean) — true when the job statistics indicate a cache hit, false otherwise.

#clustering?

def clustering?() -> Boolean

Checks if the destination table will be clustered.

See Updater#clustering_fields=, Table#clustering_fields and Table#clustering_fields=.

Returns

(Boolean) — true when the table will be clustered, or false otherwise.

#clustering_fields

def clustering_fields() -> Array<String>, nil

One or more fields on which the destination table should be clustered. Must be specified with time-based partitioning, data in the table will be first partitioned and subsequently clustered. The order of the returned fields determines the sort order of the data.

BigQuery supports clustering for both partitioned and non-partitioned tables.

See Updater#clustering_fields=, Table#clustering_fields and Table#clustering_fields=.

Returns

(Array<String>, nil) — The clustering fields, or nil if the destination table will not be clustered.

#data

def data(token: nil, max: nil, start: nil) -> Google::Cloud::Bigquery::Data

Aliases

#query_results

Retrieves the query results for the job.

Parameters

token (String) (defaults to: nil) — Page token, returned by a previous call, identifying the result set.
max (Integer) (defaults to: nil) — Maximum number of results to return.
start (Integer) (defaults to: nil) — Zero-based index of the starting row to read.

Returns

(Google::Cloud::Bigquery::Data) — An object providing access to data read from the destination table for the job.

Example

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

sql = "SELECT word FROM `bigquery-public-data.samples.shakespeare`"
job = bigquery.query_job sql

job.wait_until_done!
data = job.data

# Iterate over the first page of results
data.each do |row|
  puts row[:word]
end
# Retrieve the next page of results
data = data.next if data.next?

#ddl?

def ddl?() -> Boolean

Whether the query is a DDL statement.

Returns

(Boolean)

Example

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
query_job = bigquery.query_job "CREATE TABLE my_table (x INT64)"

query_job.statement_type #=> "CREATE_TABLE"
query_job.ddl? #=> true

#ddl_operation_performed

def ddl_operation_performed() -> String, nil

The DDL operation performed, possibly dependent on the pre-existence of the DDL target. (See #ddl_target_table.) Possible values (new values might be added in the future):

"CREATE": The query created the DDL target.
"SKIP": No-op. Example cases: the query is CREATE TABLE IF NOT EXISTS while the table already exists, or the query is DROP TABLE IF EXISTS while the table does not exist.
"REPLACE": The query replaced the DDL target. Example case: the query is CREATE OR REPLACE TABLE, and the table already exists.
"DROP": The query deleted the DDL target.

Returns

(String, nil) — The DDL operation performed.

#ddl_target_routine

def ddl_target_routine() -> Google::Cloud::Bigquery::Routine, nil

The DDL target routine, in reference state. (See Routine#reference?.) Present only for CREATE/DROP FUNCTION/PROCEDURE queries. (See #statement_type.)

Returns

(Google::Cloud::Bigquery::Routine, nil) — The DDL target routine, in reference state.

#ddl_target_table

def ddl_target_table() -> Google::Cloud::Bigquery::Table, nil

The DDL target table, in reference state. (See Table#reference?.) Present only for CREATE/DROP TABLE/VIEW queries. (See #statement_type.)

Returns

(Google::Cloud::Bigquery::Table, nil) — The DDL target table, in reference state.

#deleted_row_count

def deleted_row_count() -> Integer, nil

The number of deleted rows. Present only for DML statements DELETE, MERGE and TRUNCATE. (See #statement_type.)

Returns

(Integer, nil) — The number of deleted rows, or nil if not applicable.

#destination

def destination() -> Table

The table in which the query results are stored.

Returns

(Table) — A table instance.

#dml?

def dml?() -> Boolean

Whether the query is a DML statement.

Returns

(Boolean)

Example

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
query_job = bigquery.query_job "UPDATE my_table " \
                               "SET x = x + 1 " \
                               "WHERE x IS NOT NULL"

query_job.statement_type #=> "UPDATE"
query_job.dml? #=> true

#dry_run

def dry_run() -> Boolean

Alias Of: #dryrun?

If set, don't actually run this job. A valid query will return a mostly empty response with some processing statistics, while an invalid query will return the same error it would if it wasn't a dry run.

Returns

(Boolean) — true when the dry run flag is set for the query job, false otherwise.

#dry_run?

def dry_run?() -> Boolean

Alias Of: #dryrun?

Returns

(Boolean) — true when the dry run flag is set for the query job, false otherwise.

#dryrun

def dryrun() -> Boolean

Alias Of: #dryrun?

Returns

(Boolean) — true when the dry run flag is set for the query job, false otherwise.

#dryrun?

def dryrun?() -> Boolean

Aliases

Returns

(Boolean) — true when the dry run flag is set for the query job, false otherwise.

#encryption

def encryption() -> Google::Cloud::BigQuery::EncryptionConfiguration

The encryption configuration of the destination table.

Returns

(Google::Cloud::BigQuery::EncryptionConfiguration) — Custom encryption configuration (e.g., Cloud KMS keys).

#flatten?

def flatten?() -> Boolean

Checks if the query job flattens nested and repeated fields in the query results. The default is true. If the value is false,

large_results? should return `true`.

Returns

(Boolean) — true when the job flattens results, false otherwise.

#inserted_row_count

def inserted_row_count() -> Integer, nil

The number of inserted rows. Present only for DML statements INSERT and MERGE. (See #statement_type.)

Returns

(Integer, nil) — The number of inserted rows, or nil if not applicable.

#interactive?

def interactive?() -> Boolean

Checks if the priority for the query is INTERACTIVE.

Returns

(Boolean) — true when the priority is INTERACTIVE, false otherwise.

#large_results?

def large_results?() -> Boolean

Checks if the the query job allows arbitrarily large results at a slight cost to performance.

Returns

(Boolean) — true when large results are allowed, false otherwise.

#legacy_sql?

def legacy_sql?() -> Boolean

Checks if the query job is using legacy sql.

Returns

(Boolean) — true when legacy sql is used, false otherwise.

#maximum_billing_tier

def maximum_billing_tier() -> Integer, nil

Limits the billing tier for this job. Queries that have resource usage beyond this tier will raise (without incurring a charge). If unspecified, this will be set to your project default. For more information, see High-Compute queries.

Returns

(Integer, nil) — The tier number, or nil for the project default.

#maximum_bytes_billed

def maximum_bytes_billed() -> Integer, nil

Limits the bytes billed for this job. Queries that will have bytes billed beyond this limit will raise (without incurring a charge). If nil, this will be set to your project default.

Returns

(Integer, nil) — The number of bytes, or nil for the project default.

#num_dml_affected_rows

def num_dml_affected_rows() -> Integer, nil

The number of rows affected by a DML statement. Present only for DML statements INSERT, UPDATE or DELETE. (See #statement_type.)

Returns

(Integer, nil) — The number of rows affected by a DML statement, or nil if the query is not a DML statement.

#query_plan

def query_plan() -> Array<Google::Cloud::Bigquery::QueryJob::Stage>, nil

Describes the execution plan for the query.

Returns

(Array<Google::Cloud::Bigquery::QueryJob::Stage>, nil) — An array containing the stages of the execution plan.

Example

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

sql = "SELECT word FROM `bigquery-public-data.samples.shakespeare`"
job = bigquery.query_job sql

job.wait_until_done!

stages = job.query_plan
stages.each do |stage|
  puts stage.name
  stage.steps.each do |step|
    puts step.kind
    puts step.substeps.inspect
  end
end

#query_results

def query_results(token: nil, max: nil, start: nil) -> Google::Cloud::Bigquery::Data

Alias Of: #data

Retrieves the query results for the job.

Parameters

token (String) (defaults to: nil) — Page token, returned by a previous call, identifying the result set.
max (Integer) (defaults to: nil) — Maximum number of results to return.
start (Integer) (defaults to: nil) — Zero-based index of the starting row to read.

Returns

(Google::Cloud::Bigquery::Data) — An object providing access to data read from the destination table for the job.

Example

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

sql = "SELECT word FROM `bigquery-public-data.samples.shakespeare`"
job = bigquery.query_job sql

job.wait_until_done!
data = job.data

# Iterate over the first page of results
data.each do |row|
  puts row[:word]
end
# Retrieve the next page of results
data = data.next if data.next?

#range_partitioning?

def range_partitioning?() -> Boolean

Checks if the destination table will be range partitioned. See Creating and using integer range partitioned tables.

Returns

(Boolean) — true when the table is range partitioned, or false otherwise.

#range_partitioning_end

def range_partitioning_end() -> Integer, nil

The end of range partitioning, exclusive. See Creating and using integer range partitioned tables.

Returns

(Integer, nil) — The end of range partitioning, exclusive, or nil if not range partitioned.

#range_partitioning_field

def range_partitioning_field() -> String, nil

The field on which the destination table will be range partitioned, if any. The field must be a top-level NULLABLE/REQUIRED field. The only supported type is INTEGER/INT64. See Creating and using integer range partitioned tables.

Returns

(String, nil) — The partition field, if a field was configured, or nil if not range partitioned.

#range_partitioning_interval

def range_partitioning_interval() -> Integer, nil

The width of each interval. See Creating and using integer range partitioned tables.

Returns

(Integer, nil) — The width of each interval, for data in range partitions, or nil if not range partitioned.

#range_partitioning_start

def range_partitioning_start() -> Integer, nil

The start of range partitioning, inclusive. See Creating and using integer range partitioned tables.

Returns

(Integer, nil) — The start of range partitioning, inclusive, or nil if not range partitioned.

#standard_sql?

def standard_sql?() -> Boolean

Checks if the query job is using standard sql.

Returns

(Boolean) — true when standard sql is used, false otherwise.

#statement_type

def statement_type() -> String, nil

The type of query statement, if valid. Possible values (new values might be added in the future):

"ALTER_TABLE": DDL statement, see Using Data Definition Language Statements
"CREATE_MODEL": DDL statement, see Using Data Definition Language Statements
"CREATE_TABLE": DDL statement, see Using Data Definition Language Statements
"CREATE_TABLE_AS_SELECT": DDL statement, see Using Data Definition Language Statements
"CREATE_VIEW": DDL statement, see Using Data Definition Language Statements
"DELETE": DML statement, see Data Manipulation Language Syntax
"DROP_MODEL": DDL statement, see Using Data Definition Language Statements
"DROP_TABLE": DDL statement, see Using Data Definition Language Statements
"DROP_VIEW": DDL statement, see Using Data Definition Language Statements
"INSERT": DML statement, see Data Manipulation Language Syntax
"MERGE": DML statement, see Data Manipulation Language Syntax
"SELECT": SQL query, see Standard SQL Query Syntax
"UPDATE": DML statement, see Data Manipulation Language Syntax

Returns

(String, nil) — The type of query statement.

#time_partitioning?

def time_partitioning?() -> Boolean

Checks if the destination table will be time-partitioned. See Partitioned Tables.

Returns

(Boolean) — true when the table will be time-partitioned, or false otherwise.

#time_partitioning_expiration

def time_partitioning_expiration() -> Integer, nil

The expiration for the destination table partitions, if any, in seconds. See Partitioned Tables.

Returns

(Integer, nil) — The expiration time, in seconds, for data in partitions, or nil if not present.

#time_partitioning_field

def time_partitioning_field() -> String, nil

The field on which the destination table will be partitioned, if any. If not set, the destination table will be partitioned by pseudo column _PARTITIONTIME; if set, the table will be partitioned by this field. See Partitioned Tables.

Returns

(String, nil) — The partition field, if a field was configured. nil if not partitioned or not set (partitioned by pseudo column '_PARTITIONTIME').

#time_partitioning_require_filter?

def time_partitioning_require_filter?() -> Boolean

If set to true, queries over the destination table will require a partition filter that can be used for partition elimination to be specified. See Partitioned Tables.

Returns

(Boolean) — true when a partition filter will be required, or false otherwise.

#time_partitioning_type

def time_partitioning_type() -> String, nil

The period for which the destination table will be partitioned, if any. See Partitioned Tables.

Returns

(String, nil) — The partition type. The supported types are DAY, HOUR, MONTH, and YEAR, which will generate one partition per day, hour, month, and year, respectively; or nil if not present.

#udfs

def udfs() -> Array<String>

The user-defined function resources used in the query. May be either a code resource to load from a Google Cloud Storage URI (gs://bucket/path), or an inline resource that contains code for a user-defined function (UDF). Providing an inline code resource is equivalent to providing a URI for a file containing the same code. See User-Defined Functions.

Returns

(Array<String>) — An array containing Google Cloud Storage URIs and/or inline source code.

#updated_row_count

def updated_row_count() -> Integer, nil

The number of updated rows. Present only for DML statements UPDATE and MERGE. (See #statement_type.)

Returns

(Integer, nil) — The number of updated rows, or nil if not applicable.

#wait_until_done!

def wait_until_done!()

Refreshes the job until the job is DONE. The delay between refreshes will incrementally increase.

Example

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

sql = "SELECT word FROM `bigquery-public-data.samples.shakespeare`"
job = bigquery.query_job sql

job.wait_until_done!
job.done? #=> true

BigQuery API - Class Google::Cloud::Bigquery::QueryJob (v1.40.0)

QueryJob

Inherits

Examples

Methods

#batch?

#bytes_processed

#cache?

#cache_hit?

#clustering?

#clustering_fields

#data

#ddl?

#ddl_operation_performed

#ddl_target_routine

#ddl_target_table

#deleted_row_count

#destination

#dml?

#dry_run

#dry_run?

#dryrun

#dryrun?

#encryption

#flatten?

large_results? should return true.

#inserted_row_count

#interactive?

#large_results?

#legacy_sql?

#maximum_billing_tier

#maximum_bytes_billed

#num_dml_affected_rows

#query_plan

#query_results

#range_partitioning?

#range_partitioning_end

#range_partitioning_field

#range_partitioning_interval

#range_partitioning_start

#standard_sql?

#statement_type

#time_partitioning?

#time_partitioning_expiration

#time_partitioning_field

#time_partitioning_require_filter?

#time_partitioning_type

#udfs

#updated_row_count

#wait_until_done!

large_results? should return `true`.