Reference documentation and code samples for the google-cloud-bigquery class Google::Cloud::Bigquery::Dataset.
Dataset
Represents a Dataset. A dataset is a grouping mechanism that holds zero or more tables. Datasets are the lowest level unit of access control; you cannot control access at the table level. A dataset is contained within a specific project.
Inherits
- Object
Example
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.create_dataset "my_dataset", name: "My Dataset", description: "This is my Dataset"
Methods
#access
def access() { |access| ... } -> Google::Cloud::Bigquery::Dataset::Access
Retrieves the access rules for a Dataset. The rules can be updated when passing a block, see Access for all the methods available.
If the dataset is not a full resource representation (see #resource_full?), the full representation will be retrieved before the update to comply with ETag-based optimistic concurrency control.
- (access) — a block for setting rules
- access (Dataset::Access) — the object accepting rules
- (Google::Cloud::Bigquery::Dataset::Access) — The access object.
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" access = dataset.access access.writer_user? "reader@example.com" #=> false
Manage the access rules by passing a block:
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" dataset.access do |access| access.add_owner_group "owners@example.com" access.add_writer_user "writer@example.com" access.remove_writer_user "readers@example.com" access.add_reader_special :all access.add_reader_view other_dataset_view_object end
#api_url
def api_url() -> String, nil
A URL that can be used to access the dataset using the REST API.
-
(String, nil) — A REST URL for the resource, or
nil
if the object is a reference (see #reference?).
#create_materialized_view
def create_materialized_view(table_id, query, name: nil, description: nil, enable_refresh: nil, refresh_interval_ms: nil) -> Google::Cloud::Bigquery::Table
Creates a new materialized view.
Materialized views are precomputed views that periodically cache results of a query for increased performance and efficiency. BigQuery leverages precomputed results from materialized views and whenever possible reads only delta changes from the base table to compute up-to-date results.
Queries that use materialized views are generally faster and consume less resources than queries that retrieve the same data only from the base table. Materialized views are helpful to significantly boost performance of workloads that have the characteristic of common and repeated queries.
For logical views, see #create_view.
-
table_id (String) — The ID of the materialized view table. The ID must contain only letters (
[A-Za-z]
), numbers ([0-9]
), or underscores (_
). The maximum length is 1,024 characters. - query (String) — The query that BigQuery executes when the materialized view is referenced.
- name (String) (defaults to: nil) — A descriptive name for the table.
- description (String) (defaults to: nil) — A user-friendly description of the table.
- enable_refresh (Boolean) (defaults to: nil) — Enable automatic refresh of the materialized view when the base table is updated. Optional. The default value is true.
-
refresh_interval_ms (Integer) (defaults to: nil) — The maximum frequency in milliseconds at which this materialized view
will be refreshed. Optional. The default value is
1_800_000
(30 minutes).
- (Google::Cloud::Bigquery::Table) — A new table object.
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" materialized_view = dataset.create_materialized_view "my_materialized_view", "SELECT name, age FROM proj.dataset.users"
Automatic refresh can be disabled:
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" materialized_view = dataset.create_materialized_view "my_materialized_view", "SELECT name, age FROM proj.dataset.users", enable_refresh: false
#create_routine
def create_routine(routine_id) { |routine| ... } -> Google::Cloud::Bigquery::Routine
Creates a new routine. The following attributes may be set in the yielded block: Routine::Updater#routine_type=, Routine::Updater#language=, Routine::Updater#arguments=, Routine::Updater#return_type=, Routine::Updater#imported_libraries=, Routine::Updater#body=, and Routine::Updater#description=.
-
routine_id (String) — The ID of the routine. The ID must contain only
letters (
[A-Za-z]
), numbers ([0-9]
), or underscores (_
). The maximum length is 256 characters.
- (routine) — A block for setting properties on the routine.
- routine (Google::Cloud::Bigquery::Routine::Updater) — An updater to set additional properties on the routine.
- (Google::Cloud::Bigquery::Routine) — A new routine object.
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" routine = dataset.create_routine "my_routine" do |r| r.routine_type = "SCALAR_FUNCTION" r.language = "SQL" r.arguments = [ Google::Cloud::Bigquery::Argument.new(name: "x", data_type: "INT64") ] r.body = "x * 3" r.description = "My routine description" end puts routine.routine_id
Extended example:
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" routine = dataset.create_routine "my_routine" do |r| r.routine_type = "SCALAR_FUNCTION" r.language = :SQL r.body = "(SELECT SUM(IF(elem.name = \"foo\",elem.val,null)) FROM UNNEST(arr) AS elem)" r.arguments = [ Google::Cloud::Bigquery::Argument.new( name: "arr", argument_kind: "FIXED_TYPE", data_type: Google::Cloud::Bigquery::StandardSql::DataType.new( type_kind: "ARRAY", array_element_type: Google::Cloud::Bigquery::StandardSql::DataType.new( type_kind: "STRUCT", struct_type: Google::Cloud::Bigquery::StandardSql::StructType.new( fields: [ Google::Cloud::Bigquery::StandardSql::Field.new( name: "name", type: Google::Cloud::Bigquery::StandardSql::DataType.new(type_kind: "STRING") ), Google::Cloud::Bigquery::StandardSql::Field.new( name: "val", type: Google::Cloud::Bigquery::StandardSql::DataType.new(type_kind: "INT64") ) ] ) ) ) ) ] end
#create_table
def create_table(table_id, name: nil, description: nil) { |table| ... } -> Google::Cloud::Bigquery::Table
Creates a new table. If you are adapting existing code that was written for the Rest API , you can pass the table's schema as a hash (see example.)
-
table_id (String) — The ID of the table. The ID must contain only
letters (
[A-Za-z]
), numbers ([0-9]
), or underscores (_
). The maximum length is 1,024 characters. - name (String) (defaults to: nil) — A descriptive name for the table.
- description (String) (defaults to: nil) — A user-friendly description of the table.
- (table) — a block for setting the table
- table (Google::Cloud::Bigquery::Table::Updater) — An updater to set additional properties on the table in the API request to create it.
- (Google::Cloud::Bigquery::Table) — A new table object.
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" table = dataset.create_table "my_table"
You can also pass name and description options.
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" table = dataset.create_table "my_table", name: "My Table", description: "A description of table."
Or the table's schema can be configured with the block.
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" table = dataset.create_table "my_table" do |t| t.schema.string "first_name", mode: :required t.schema.record "cities_lived", mode: :required do |s| s.string "place", mode: :required s.integer "number_of_years", mode: :required end end
You can define the schema using a nested block.
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" table = dataset.create_table "my_table" do |t| t.name = "My Table" t.description = "A description of my table." t.schema do |s| s.string "first_name", mode: :required s.record "cities_lived", mode: :repeated do |r| r.string "place", mode: :required r.integer "number_of_years", mode: :required end end end
With time partitioning and clustering.
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" table = dataset.create_table "my_table" do |t| t.schema do |schema| schema.timestamp "dob", mode: :required schema.string "first_name", mode: :required schema.string "last_name", mode: :required end t.time_partitioning_type = "DAY" t.time_partitioning_field = "dob" t.clustering_fields = ["last_name", "first_name"] end
With range partitioning.
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" table = dataset.create_table "my_table" do |t| t.schema do |schema| schema.integer "my_table_id", mode: :required schema.string "my_table_data", mode: :required end t.range_partitioning_field = "my_table_id" t.range_partitioning_start = 0 t.range_partitioning_interval = 10 t.range_partitioning_end = 100 end
#create_view
def create_view(table_id, query, name: nil, description: nil, standard_sql: nil, legacy_sql: nil, udfs: nil) -> Google::Cloud::Bigquery::Table
Creates a new view, which is a virtual table defined by the given SQL query.
With BigQuery's logical views, the query that defines the view is re-executed every time the view is queried. Queries are billed according to the total amount of data in all table fields referenced directly or indirectly by the top-level query. (See Table#view? and Table#query.)
For materialized views, see #create_materialized_view.
-
table_id (String) — The ID of the view table. The ID must contain
only letters (
[A-Za-z]
), numbers ([0-9]
), or underscores (_
). The maximum length is 1,024 characters. - query (String) — The query that BigQuery executes when the view is referenced.
- name (String) (defaults to: nil) — A descriptive name for the table.
- description (String) (defaults to: nil) — A user-friendly description of the table.
- standard_sql (Boolean) (defaults to: nil) — Specifies whether to use BigQuery's standard SQL dialect. Optional. The default value is true.
- legacy_sql (Boolean) (defaults to: nil) — Specifies whether to use BigQuery's legacy SQL dialect. Optional. The default value is false.
-
udfs (Array<String>, String) (defaults to: nil) — User-defined function resources
used in a legacy SQL query. May be either a code resource to load from
a Google Cloud Storage URI (
gs://bucket/path
), or an inline resource that contains code for a user-defined function (UDF). Providing an inline code resource is equivalent to providing a URI for a file containing the same code.This parameter is used for defining User Defined Function (UDF) resources only when using legacy SQL. Users of standard SQL should leverage either DDL (e.g.
CREATE [TEMPORARY] FUNCTION ...
) or the Routines API to define UDF resources.For additional information on migrating, see: Migrating to standard SQL - Differences in user-defined JavaScript functions
- (Google::Cloud::Bigquery::Table) — A new table object.
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" view = dataset.create_view "my_view", "SELECT name, age FROM proj.dataset.users"
A name and description can be provided:
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" view = dataset.create_view "my_view", "SELECT name, age FROM proj.dataset.users", name: "My View", description: "This is my view"
#created_at
def created_at() -> Time, nil
The time when this dataset was created.
-
(Time, nil) — The creation time, or
nil
if not present or the object is a reference (see #reference?).
#dataset_id
def dataset_id() -> String
A unique ID for this dataset, without the project name.
-
(String) — The ID must contain only letters (
[A-Za-z]
), numbers ([0-9]
), or underscores (_
). The maximum length is 1,024 characters.
#default_encryption
def default_encryption() -> EncryptionConfiguration, nil
The EncryptionConfiguration object that represents the default encryption method for all tables and models in the dataset. Once this property is set, all newly-created partitioned tables and models in the dataset will have their encryption set to this value, unless table creation request (or query) overrides it.
Present only if this dataset is using custom default encryption.
-
(EncryptionConfiguration, nil) — The default encryption
configuration.
@!group Attributes
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" encrypt_config = dataset.default_encryption
#default_encryption=
def default_encryption=(value)
Set the EncryptionConfiguration object that represents the default encryption method for all tables and models in the dataset. Once this property is set, all newly-created partitioned tables and models in the dataset will have their encryption set to this value, unless table creation request (or query) overrides it.
If the dataset is not a full resource representation (see #resource_full?), the full representation will be retrieved before the update to comply with ETag-based optimistic concurrency control.
- value (EncryptionConfiguration) — The new encryption config.
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" key_name = "projects/a/locations/b/keyRings/c/cryptoKeys/d" encrypt_config = bigquery.encryption kms_key: key_name dataset.default_encryption = encrypt_config
#default_expiration
def default_expiration() -> Integer, nil
The default lifetime of all tables in the dataset, in milliseconds.
-
(Integer, nil) — The default table expiration in milliseconds,
or
nil
if not present or the object is a reference (see #reference?).
#default_expiration=
def default_expiration=(new_default_expiration)
Updates the default lifetime of all tables in the dataset, in milliseconds.
If the dataset is not a full resource representation (see #resource_full?), the full representation will be retrieved before the update to comply with ETag-based optimistic concurrency control.
- new_default_expiration (Integer) — The new default table expiration in milliseconds.
#delete
def delete(force: nil) -> Boolean
Permanently deletes the dataset. The dataset must be empty before it
can be deleted unless the force
option is set to true
.
-
force (Boolean) (defaults to: nil) — If
true
, delete all the tables in the dataset. Iffalse
and the dataset contains tables, the request will fail. Default isfalse
.
-
(Boolean) — Returns
true
if the dataset was deleted.
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" dataset.delete
#description
def description() -> String, nil
A user-friendly description of the dataset.
-
(String, nil) — The description, or
nil
if the object is a reference (see #reference?).
#description=
def description=(new_description)
Updates the user-friendly description of the dataset.
If the dataset is not a full resource representation (see #resource_full?), the full representation will be retrieved before the update to comply with ETag-based optimistic concurrency control.
- new_description (String) — The new description for the dataset.
#etag
def etag() -> String, nil
The ETag hash of the dataset.
-
(String, nil) — The ETag hash, or
nil
if the object is a reference (see #reference?).
#exists?
def exists?(force: false) -> Boolean
Determines whether the dataset exists in the BigQuery service. The
result is cached locally. To refresh state, set force
to true
.
-
force (Boolean) (defaults to: false) — Force the latest resource representation to be
retrieved from the BigQuery service when
true
. Otherwise the return value of this method will be memoized to reduce the number of API calls made to the BigQuery service. The default isfalse
.
-
(Boolean) —
true
when the dataset exists in the BigQuery service,false
otherwise.
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset", skip_lookup: true dataset.exists? # true
#external
def external(url, format: nil) -> External::DataSource
Creates a new External::DataSource (or subclass) object that represents the external data source that can be queried from directly, even though the data is not stored in BigQuery. Instead of loading or streaming the data, this object references the external data source.
- url (String, Array<String>) — The fully-qualified URL(s) that point to your data in Google Cloud. An attempt will be made to derive the format from the URLs provided.
-
format (String|Symbol) (defaults to: nil) —
The data format. This value will be used even if the provided URLs are recognized as a different format. Optional.
The following values are supported:
csv
- CSVjson
- Newline-delimited JSONavro
- Avrosheets
- Google Sheetsdatastore_backup
- Cloud Datastore backupbigtable
- Bigtable
- (ext)
- (External::DataSource) — External data source.
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" csv_url = "gs://bucket/path/to/data.csv" csv_table = dataset.external csv_url do |csv| csv.autodetect = true csv.skip_leading_rows = 1 end data = dataset.query "SELECT * FROM my_ext_table", external: { my_ext_table: csv_table } data.each do |row| puts row[:name] end
#insert
def insert(table_id, rows, insert_ids: nil, skip_invalid: nil, ignore_unknown: nil, autocreate: nil, &block) { |table| ... } -> Google::Cloud::Bigquery::InsertResponse
Inserts data into the given table for near-immediate querying, without the need to complete a load operation before the data can appear in query results.
Simple Ruby types are generally accepted per JSON rules, along with the following support for BigQuery's more complex types:
| BigQuery | Ruby | Notes |
|--------------|--------------------------------------|----------------------------------------------------|
| NUMERIC
| BigDecimal
| BigDecimal
values will be rounded to scale 9. |
| BIGNUMERIC
| String
| Pass as String
to avoid rounding to scale 9. |
| DATETIME
| DateTime
| DATETIME
does not support time zone. |
| DATE
| Date
| |
| GEOGRAPHY
| String
| |
| TIMESTAMP
| Time
| |
| TIME
| Google::Cloud::BigQuery::Time
| |
| BYTES
| File
, IO
, StringIO
, or similar | |
| ARRAY
| Array
| Nested arrays, nil
values are not supported. |
| STRUCT
| Hash
| Hash keys may be strings or symbols. |
Because BigQuery's streaming API is designed for high insertion rates, modifications to the underlying table metadata are eventually consistent when interacting with the streaming system. In most cases metadata changes are propagated within minutes, but during this period API responses may reflect the inconsistent state of the table.
The value :skip
can be provided to skip the generation of IDs for all rows, or to skip the generation of an
ID for a specific row in the array.
- table_id (String) — The ID of the destination table.
-
rows (Hash, Array<Hash>) — A hash object or array of hash objects
containing the data. Required.
BigDecimal
values will be rounded to scale 9 to conform with the BigQueryNUMERIC
data type. To avoid roundingBIGNUMERIC
type values with scale greater than 9, useString
instead ofBigDecimal
. - insert_ids (Array<String|Symbol>, Symbol) (defaults to: nil) — A unique ID for each row. BigQuery uses this property to detect duplicate insertion requests on a best-effort basis. For more information, see data consistency. Optional. If not provided, the client library will assign a UUID to each row before the request is sent.
-
skip_invalid (Boolean) (defaults to: nil) — Insert all valid rows of a request, even
if invalid rows exist. The default value is
false
, which causes the entire request to fail if any invalid rows exist. - ignore_unknown (Boolean) (defaults to: nil) — Accept rows that contain values that do not match the schema. The unknown values are ignored. Default is false, which treats unknown values as errors.
-
autocreate (Boolean) (defaults to: nil) — Specifies whether the method should create
a new table with the given
table_id
, if no table is found fortable_id
. The default value is false.
- (table) — a block for setting the table
-
table (Google::Cloud::Bigquery::Table::Updater) — An updater
to set additional properties on the table in the API request to
create it. Only used when
autocreate
is set and the table does not already exist.
- (Google::Cloud::Bigquery::InsertResponse) — An insert response object.
- (ArgumentError)
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" rows = [ { "first_name" => "Alice", "age" => 21 }, { "first_name" => "Bob", "age" => 22 } ] dataset.insert "my_table", rows
Avoid retrieving the dataset with skip_lookup
:
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset", skip_lookup: true rows = [ { "first_name" => "Alice", "age" => 21 }, { "first_name" => "Bob", "age" => 22 } ] dataset.insert "my_table", rows
Using autocreate
to create a new table if none exists.
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" rows = [ { "first_name" => "Alice", "age" => 21 }, { "first_name" => "Bob", "age" => 22 } ] dataset.insert "my_table", rows, autocreate: true do |t| t.schema.string "first_name", mode: :required t.schema.integer "age", mode: :required end
Pass BIGNUMERIC
value as a string to avoid rounding to scale 9 in the conversion from BigDecimal
:
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" row = { "my_numeric" => BigDecimal("123456798.987654321"), "my_bignumeric" => "123456798.98765432100001" # BigDecimal would be rounded, use String instead! } dataset.insert "my_table", row
#insert_async
def insert_async(table_id, skip_invalid: nil, ignore_unknown: nil, max_bytes: 10_000_000, max_rows: 500, interval: 10, threads: 4, &block) { |result| ... } -> Table::AsyncInserter
Create an asynchronous inserter object used to insert rows in batches.
- table_id (String) — The ID of the table to insert rows into.
-
skip_invalid (Boolean) (defaults to: nil) — Insert all valid rows of a request, even
if invalid rows exist. The default value is
false
, which causes the entire request to fail if any invalid rows exist. - ignore_unknown (Boolean) (defaults to: nil) — Accept rows that contain values that do not match the schema. The unknown values are ignored. Default is false, which treats unknown values as errors.
- max_rows (Integer) (defaults to: 500) — The maximum number of rows to be collected before the batch is published. Default is 500.
- (response) — the callback for when a batch of rows is inserted
- result (Table::AsyncInserter::Result) — the result of the asynchronous insert
- (Table::AsyncInserter) — Returns an inserter object.
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" inserter = dataset.insert_async "my_table" do |result| if result.error? log_error result.error else log_insert "inserted #{result.insert_count} rows " \ "with #{result.error_count} errors" end end rows = [ { "first_name" => "Alice", "age" => 21 }, { "first_name" => "Bob", "age" => 22 } ] inserter.insert rows inserter.stop.wait!
#labels
def labels() -> Hash<String, String>, nil
A hash of user-provided labels associated with this dataset. Labels are used to organize and group datasets. See Using Labels.
The returned hash is frozen and changes are not allowed. Use #labels= to replace the entire hash.
-
(Hash<String, String>, nil) — A hash containing key/value pairs,
or
nil
if the object is a reference (see #reference?).
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" labels = dataset.labels labels["department"] #=> "shipping"
#labels=
def labels=(labels)
Updates the hash of user-provided labels associated with this dataset. Labels are used to organize and group datasets. See Using Labels.
If the dataset is not a full resource representation (see #resource_full?), the full representation will be retrieved before the update to comply with ETag-based optimistic concurrency control.
-
labels (Hash<String, String>) —
A hash containing key/value pairs.
The labels applied to a resource must meet the following requirements:
- Each resource can have multiple labels, up to a maximum of 64.
- Each label must be a key-value pair.
- Keys have a minimum length of 1 character and a maximum length of 63 characters, and cannot be empty. Values can be empty, and have a maximum length of 63 characters.
- Keys and values can contain only lowercase letters, numeric characters, underscores, and dashes. All characters must use UTF-8 encoding, and international characters are allowed.
- The key portion of a label must be unique. However, you can use the same key with multiple resources.
- Keys must start with a lowercase letter or international character.
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" dataset.labels = { "department" => "shipping" }
#load
def load(table_id, files, format: nil, create: nil, write: nil, projection_fields: nil, jagged_rows: nil, quoted_newlines: nil, encoding: nil, delimiter: nil, ignore_unknown: nil, max_bad_records: nil, quote: nil, skip_leading: nil, schema: nil, autodetect: nil, null_marker: nil, &block) { |updater| ... } -> Boolean
Loads data into the provided destination table using a synchronous method that blocks for a response. Timeouts and transient errors are generally handled as needed to complete the job. See also #load_job.
For the source of the data, you can pass a google-cloud storage file
path or a google-cloud-storage File
instance. Or, you can upload a
file directly. See Loading Data with a POST
Request.
The geographic location for the job ("US", "EU", etc.) can be set via LoadJob::Updater#location= in a block passed to this method. If the dataset is a full resource representation (see #resource_full?), the location of the job will be automatically set to the location of the dataset.
- table_id (String) — The destination table to load the data into.
- files (File, Google::Cloud::Storage::File, String, URI, Array<Google::Cloud::Storage::File, String, URI>) — A file or the URI of a Google Cloud Storage file, or an Array of those, containing data to load into the table.
-
format (String) (defaults to: nil) —
The exported file format. The default value is
csv
.The following values are supported:
csv
- CSVjson
- Newline-delimited JSONavro
- Avroorc
- ORCparquet
- Parquetdatastore_backup
- Cloud Datastore backup
-
create (String) (defaults to: nil) —
Specifies whether the job is allowed to create new tables. The default value is
needed
.The following values are supported:
needed
- Create the table if it does not exist.never
- The table must already exist. A 'notFound' error is raised if the table does not exist.
-
write (String) (defaults to: nil) —
Specifies how to handle data already present in the table. The default value is
append
.The following values are supported:
truncate
- BigQuery overwrites the table data.append
- BigQuery appends the data to the table.empty
- An error will be returned if the table already contains data.
-
projection_fields (Array<String>) (defaults to: nil) — If the
format
option is set todatastore_backup
, indicates which entity properties to load from a Cloud Datastore backup. Property names are case sensitive and must be top-level properties. If not set, BigQuery loads all properties. If any named property isn't found in the Cloud Datastore backup, an invalid error is returned. -
jagged_rows (Boolean) (defaults to: nil) — Accept rows that are missing trailing
optional columns. The missing values are treated as nulls. If
false
, records with missing trailing columns are treated as bad records, and if there are too many bad records, an invalid error is returned in the job result. The default value isfalse
. Only applicable to CSV, ignored for other formats. -
quoted_newlines (Boolean) (defaults to: nil) — Indicates if BigQuery should allow
quoted data sections that contain newline characters in a CSV file.
The default value is
false
. -
autodetect (Boolean) (defaults to: nil) — Indicates if BigQuery should
automatically infer the options and schema for CSV and JSON sources.
The default value is
false
. -
encoding (String) (defaults to: nil) — The character encoding of the data. The
supported values are
UTF-8
orISO-8859-1
. The default value isUTF-8
. -
delimiter (String) (defaults to: nil) — Specifices the separator for fields in a CSV
file. BigQuery converts the string to
ISO-8859-1
encoding, and then uses the first byte of the encoded string to split the data in its raw, binary state. Default is,
. -
ignore_unknown (Boolean) (defaults to: nil) —
Indicates if BigQuery should allow extra values that are not represented in the table schema. If true, the extra values are ignored. If false, records with extra columns are treated as bad records, and if there are too many bad records, an invalid error is returned in the job result. The default value is
false
.The
format
property determines what BigQuery treats as an extra value:CSV
: Trailing columnsJSON
: Named values that don't match any column names
-
max_bad_records (Integer) (defaults to: nil) — The maximum number of bad records
that BigQuery can ignore when running the job. If the number of bad
records exceeds this value, an invalid error is returned in the job
result. The default value is
0
, which requires that all records are valid. -
null_marker (String) (defaults to: nil) — Specifies a string that represents a null
value in a CSV file. For example, if you specify
\N
, BigQuery interprets\N
as a null value when loading a CSV file. The default value is the empty string. If you set this property to a custom value, BigQuery throws an error if an empty string is present for all data types except for STRING and BYTE. For STRING and BYTE columns, BigQuery interprets the empty string as an empty value. -
quote (String) (defaults to: nil) — The value that is used to quote data sections in
a CSV file. BigQuery converts the string to ISO-8859-1 encoding, and
then uses the first byte of the encoded string to split the data in
its raw, binary state. The default value is a double-quote
"
. If your data does not contain quoted sections, set the property value to an empty string. If your data contains quoted newline characters, you must also set the allowQuotedNewlines property to true. -
skip_leading (Integer) (defaults to: nil) — The number of rows at the top of a CSV
file that BigQuery will skip when loading the data. The default
value is
0
. This property is useful if you have header rows in the file that should be skipped. -
schema (Google::Cloud::Bigquery::Schema) (defaults to: nil) — The schema for the
destination table. Optional. The schema can be omitted if the
destination table already exists, or if you're loading data from a
Google Cloud Datastore backup.
See Project#schema for the creation of the schema for use with this option. Also note that for most use cases, the block yielded by this method is a more convenient way to configure the schema.
- (updater) — A block for setting the schema of the destination table and other options for the load job. The schema can be omitted if the destination table already exists, or if you're loading data from a Google Cloud Datastore backup.
- updater (Google::Cloud::Bigquery::LoadJob::Updater) — An updater to modify the load job and its schema.
-
(Boolean) — Returns
true
if the load job was successful.
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" gs_url = "gs://my-bucket/file-name.csv" dataset.load "my_new_table", gs_url do |schema| schema.string "first_name", mode: :required schema.record "cities_lived", mode: :repeated do |nested_schema| nested_schema.string "place", mode: :required nested_schema.integer "number_of_years", mode: :required end end
Pass a google-cloud-storage File
instance:
require "google/cloud/bigquery" require "google/cloud/storage" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" storage = Google::Cloud::Storage.new bucket = storage.bucket "my-bucket" file = bucket.file "file-name.csv" dataset.load "my_new_table", file do |schema| schema.string "first_name", mode: :required schema.record "cities_lived", mode: :repeated do |nested_schema| nested_schema.string "place", mode: :required nested_schema.integer "number_of_years", mode: :required end end
Pass a list of google-cloud-storage files:
require "google/cloud/bigquery" require "google/cloud/storage" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" storage = Google::Cloud::Storage.new bucket = storage.bucket "my-bucket" file = bucket.file "file-name.csv" list = [file, "gs://my-bucket/file-name2.csv"] dataset.load "my_new_table", list do |schema| schema.string "first_name", mode: :required schema.record "cities_lived", mode: :repeated do |nested_schema| nested_schema.string "place", mode: :required nested_schema.integer "number_of_years", mode: :required end end
Upload a file directly:
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" file = File.open "my_data.csv" dataset.load "my_new_table", file do |schema| schema.string "first_name", mode: :required schema.record "cities_lived", mode: :repeated do |nested_schema| nested_schema.string "place", mode: :required nested_schema.integer "number_of_years", mode: :required end end
Schema is not required with a Cloud Datastore backup:
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" dataset.load "my_new_table", "gs://my-bucket/xxxx.kind_name.backup_info" do |j| j.format = "datastore_backup" end
#load_job
def load_job(table_id, files, format: nil, create: nil, write: nil, projection_fields: nil, jagged_rows: nil, quoted_newlines: nil, encoding: nil, delimiter: nil, ignore_unknown: nil, max_bad_records: nil, quote: nil, skip_leading: nil, schema: nil, job_id: nil, prefix: nil, labels: nil, autodetect: nil, null_marker: nil, dryrun: nil) { |updater| ... } -> Google::Cloud::Bigquery::LoadJob
Loads data into the provided destination table using an asynchronous method. In this method, a LoadJob is immediately returned. The caller may poll the service by repeatedly calling Job#reload! and Job#done? to detect when the job is done, or simply block until the job is done by calling #Job#wait_until_done!. See also #load.
For the source of the data, you can pass a google-cloud storage file
path or a google-cloud-storage File
instance. Or, you can upload a
file directly. See Loading Data with a POST
Request.
The geographic location for the job ("US", "EU", etc.) can be set via LoadJob::Updater#location= in a block passed to this method. If the dataset is a full resource representation (see #resource_full?), the location of the job will be automatically set to the location of the dataset.
- table_id (String) — The destination table to load the data into.
- files (File, Google::Cloud::Storage::File, String, URI, Array<Google::Cloud::Storage::File, String, URI>) — A file or the URI of a Google Cloud Storage file, or an Array of those, containing data to load into the table.
-
format (String) (defaults to: nil) —
The exported file format. The default value is
csv
.The following values are supported:
csv
- CSVjson
- Newline-delimited JSONavro
- Avroorc
- ORCparquet
- Parquetdatastore_backup
- Cloud Datastore backup
-
create (String) (defaults to: nil) —
Specifies whether the job is allowed to create new tables. The default value is
needed
.The following values are supported:
needed
- Create the table if it does not exist.never
- The table must already exist. A 'notFound' error is raised if the table does not exist.
-
write (String) (defaults to: nil) —
Specifies how to handle data already present in the table. The default value is
append
.The following values are supported:
truncate
- BigQuery overwrites the table data.append
- BigQuery appends the data to the table.empty
- An error will be returned if the table already contains data.
-
projection_fields (Array<String>) (defaults to: nil) — If the
format
option is set todatastore_backup
, indicates which entity properties to load from a Cloud Datastore backup. Property names are case sensitive and must be top-level properties. If not set, BigQuery loads all properties. If any named property isn't found in the Cloud Datastore backup, an invalid error is returned. -
jagged_rows (Boolean) (defaults to: nil) — Accept rows that are missing trailing
optional columns. The missing values are treated as nulls. If
false
, records with missing trailing columns are treated as bad records, and if there are too many bad records, an invalid error is returned in the job result. The default value isfalse
. Only applicable to CSV, ignored for other formats. -
quoted_newlines (Boolean) (defaults to: nil) — Indicates if BigQuery should allow
quoted data sections that contain newline characters in a CSV file.
The default value is
false
. -
autodetect (Boolean) (defaults to: nil) — Indicates if BigQuery should
automatically infer the options and schema for CSV and JSON sources.
The default value is
false
. -
encoding (String) (defaults to: nil) — The character encoding of the data. The
supported values are
UTF-8
orISO-8859-1
. The default value isUTF-8
. -
delimiter (String) (defaults to: nil) — Specifices the separator for fields in a CSV
file. BigQuery converts the string to
ISO-8859-1
encoding, and then uses the first byte of the encoded string to split the data in its raw, binary state. Default is,
. -
ignore_unknown (Boolean) (defaults to: nil) —
Indicates if BigQuery should allow extra values that are not represented in the table schema. If true, the extra values are ignored. If false, records with extra columns are treated as bad records, and if there are too many bad records, an invalid error is returned in the job result. The default value is
false
.The
format
property determines what BigQuery treats as an extra value:CSV
: Trailing columnsJSON
: Named values that don't match any column names
-
max_bad_records (Integer) (defaults to: nil) — The maximum number of bad records
that BigQuery can ignore when running the job. If the number of bad
records exceeds this value, an invalid error is returned in the job
result. The default value is
0
, which requires that all records are valid. -
null_marker (String) (defaults to: nil) — Specifies a string that represents a null
value in a CSV file. For example, if you specify
\N
, BigQuery interprets\N
as a null value when loading a CSV file. The default value is the empty string. If you set this property to a custom value, BigQuery throws an error if an empty string is present for all data types except for STRING and BYTE. For STRING and BYTE columns, BigQuery interprets the empty string as an empty value. -
quote (String) (defaults to: nil) — The value that is used to quote data sections in
a CSV file. BigQuery converts the string to ISO-8859-1 encoding, and
then uses the first byte of the encoded string to split the data in
its raw, binary state. The default value is a double-quote
"
. If your data does not contain quoted sections, set the property value to an empty string. If your data contains quoted newline characters, you must also set the allowQuotedNewlines property to true. -
skip_leading (Integer) (defaults to: nil) — The number of rows at the top of a CSV
file that BigQuery will skip when loading the data. The default
value is
0
. This property is useful if you have header rows in the file that should be skipped. -
schema (Google::Cloud::Bigquery::Schema) (defaults to: nil) — The schema for the
destination table. Optional. The schema can be omitted if the
destination table already exists, or if you're loading data from a
Google Cloud Datastore backup.
See Project#schema for the creation of the schema for use with this option. Also note that for most use cases, the block yielded by this method is a more convenient way to configure the schema.
-
job_id (String) (defaults to: nil) — A user-defined ID for the load job. The ID
must contain only letters (
[A-Za-z]
), numbers ([0-9]
), underscores (_
), or dashes (-
). The maximum length is 1,024 characters. Ifjob_id
is provided, thenprefix
will not be used.See Generating a job ID.
-
prefix (String) (defaults to: nil) — A string, usually human-readable, that will be
prepended to a generated value to produce a unique job ID. For
example, the prefix
daily_import_job_
can be given to generate a job ID such asdaily_import_job_12vEDtMQ0mbp1Mo5Z7mzAFQJZazh
. The prefix must contain only letters ([A-Za-z]
), numbers ([0-9]
), underscores (_
), or dashes (-
). The maximum length of the entire ID is 1,024 characters. Ifjob_id
is provided, thenprefix
will not be used. -
labels (Hash) (defaults to: nil) —
A hash of user-provided labels associated with the job. You can use these to organize and group your jobs.
The labels applied to a resource must meet the following requirements:
- Each resource can have multiple labels, up to a maximum of 64.
- Each label must be a key-value pair.
- Keys have a minimum length of 1 character and a maximum length of 63 characters, and cannot be empty. Values can be empty, and have a maximum length of 63 characters.
- Keys and values can contain only lowercase letters, numeric characters, underscores, and dashes. All characters must use UTF-8 encoding, and international characters are allowed.
- The key portion of a label must be unique. However, you can use the same key with multiple resources.
- Keys must start with a lowercase letter or international character.
- dryrun (Boolean) (defaults to: nil) — If set, don't actually run this job. Behavior is undefined however for non-query jobs and may result in an error. Deprecated.
- (updater) — A block for setting the schema and other options for the destination table. The schema can be omitted if the destination table already exists, or if you're loading data from a Google Cloud Datastore backup.
- updater (Google::Cloud::Bigquery::LoadJob::Updater) — An updater to modify the load job and its schema.
- (Google::Cloud::Bigquery::LoadJob) — A new load job object.
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" gs_url = "gs://my-bucket/file-name.csv" load_job = dataset.load_job "my_new_table", gs_url do |schema| schema.string "first_name", mode: :required schema.record "cities_lived", mode: :repeated do |nested_schema| nested_schema.string "place", mode: :required nested_schema.integer "number_of_years", mode: :required end end
Pass a google-cloud-storage File
instance:
require "google/cloud/bigquery" require "google/cloud/storage" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" storage = Google::Cloud::Storage.new bucket = storage.bucket "my-bucket" file = bucket.file "file-name.csv" load_job = dataset.load_job "my_new_table", file do |schema| schema.string "first_name", mode: :required schema.record "cities_lived", mode: :repeated do |nested_schema| nested_schema.string "place", mode: :required nested_schema.integer "number_of_years", mode: :required end end
Pass a list of google-cloud-storage files:
require "google/cloud/bigquery" require "google/cloud/storage" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" storage = Google::Cloud::Storage.new bucket = storage.bucket "my-bucket" file = bucket.file "file-name.csv" list = [file, "gs://my-bucket/file-name2.csv"] load_job = dataset.load_job "my_new_table", list do |schema| schema.string "first_name", mode: :required schema.record "cities_lived", mode: :repeated do |nested_schema| nested_schema.string "place", mode: :required nested_schema.integer "number_of_years", mode: :required end end
Upload a file directly:
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" file = File.open "my_data.csv" load_job = dataset.load_job "my_new_table", file do |schema| schema.string "first_name", mode: :required schema.record "cities_lived", mode: :repeated do |nested_schema| nested_schema.string "place", mode: :required nested_schema.integer "number_of_years", mode: :required end end
Schema is not required with a Cloud Datastore backup:
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" load_job = dataset.load_job( "my_new_table", "gs://my-bucket/xxxx.kind_name.backup_info") do |j| j.format = "datastore_backup" end
#location
def location() -> String, nil
The geographic location where the dataset should reside. Possible
values include EU
and US
. The default value is US
.
-
(String, nil) — The geographic location, or
nil
if the object is a reference (see #reference?).
#model
def model(model_id, skip_lookup: nil) -> Google::Cloud::Bigquery::Model, nil
Retrieves an existing model by ID.
- model_id (String) — The ID of a model.
-
skip_lookup (Boolean) (defaults to: nil) — Optionally create just a local reference
object without verifying that the resource exists on the BigQuery
service. Calls made on this object will raise errors if the resource
does not exist. Default is
false
. Optional.
-
(Google::Cloud::Bigquery::Model, nil) — Returns
nil
if the model does not exist.
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" model = dataset.model "my_model" puts model.model_id
Avoid retrieving the model resource with skip_lookup
:
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" model = dataset.model "my_model", skip_lookup: true
#models
def models(token: nil, max: nil) -> Array<Google::Cloud::Bigquery::Model>
Retrieves the list of models belonging to the dataset.
- token (String) (defaults to: nil) — A previously-returned page token representing part of the larger set of results to view.
- max (Integer) (defaults to: nil) — Maximum number of models to return.
- (Array<Google::Cloud::Bigquery::Model>) — An array of models (See Model::List)
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" models = dataset.models models.each do |model| puts model.model_id end
Retrieve all models: (See Model::List#all)
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" models = dataset.models models.all do |model| puts model.model_id end
#modified_at
def modified_at() -> Time, nil
The date when this dataset or any of its tables was last modified.
-
(Time, nil) — The last modified time, or
nil
if not present or the object is a reference (see #reference?).
#name
def name() -> String, nil
A descriptive name for the dataset.
-
(String, nil) — The friendly name, or
nil
if the object is a reference (see #reference?).
#name=
def name=(new_name)
Updates the descriptive name for the dataset.
If the dataset is not a full resource representation (see #resource_full?), the full representation will be retrieved before the update to comply with ETag-based optimistic concurrency control.
-
new_name (String) — The new friendly name, or
nil
if the object is a reference (see #reference?).
#project_id
def project_id() -> String
The ID of the project containing this dataset.
- (String) — The project ID.
#query
def query(query, params: nil, types: nil, external: nil, max: nil, cache: true, standard_sql: nil, legacy_sql: nil, session_id: nil, &block) { |job| ... } -> Google::Cloud::Bigquery::Data
Queries data and waits for the results. In this method, a QueryJob is created and its results are saved to a temporary table, then read from the table. Timeouts and transient errors are generally handled as needed to complete the query. When used for executing DDL/DML statements, this method does not return row data.
Sets the current dataset as the default dataset in the query. Useful for using unqualified table names.
The geographic location for the job ("US", "EU", etc.) can be set via QueryJob::Updater#location= in a block passed to this method. If the dataset is a full resource representation (see #resource_full?), the location of the job will be automatically set to the location of the dataset.
- query (String) — A query string, following the BigQuery query syntax, of the query to execute. Example: "SELECT count(f1) FROM [myProjectId:myDatasetId.myTableId]".
-
params (Array, Hash) (defaults to: nil) — Standard SQL only. Used to pass query arguments when the
query
string contains either positional (?
) or named (@myparam
) query parameters. If value passed is an array["foo"]
, the query must use positional query parameters. If value passed is a hash{ myparam: "foo" }
, the query must use named query parameters. When set,legacy_sql
will automatically be set to false andstandard_sql
to true.BigQuery types are converted from Ruby types as follows:
| BigQuery | Ruby | Notes | |--------------|--------------------------------------|----------------------------------------------------| |
BOOL
|true
/false
| | |INT64
|Integer
| | |FLOAT64
|Float
| | |NUMERIC
|BigDecimal
|BigDecimal
values will be rounded to scale 9. | |BIGNUMERIC
|BigDecimal
| NOT AUTOMATIC: Must be mapped usingtypes
, below.| |STRING
|String
| | |DATETIME
|DateTime
|DATETIME
does not support time zone. | |DATE
|Date
| | |GEOGRAPHY
|String
(WKT or GeoJSON) | NOT AUTOMATIC: Must be mapped usingtypes
, below.| |TIMESTAMP
|Time
| | |TIME
|Google::Cloud::BigQuery::Time
| | |BYTES
|File
,IO
,StringIO
, or similar | | |ARRAY
|Array
| Nested arrays,nil
values are not supported. | |STRUCT
|Hash
| Hash keys may be strings or symbols. |See Data Types for an overview of each BigQuery data type, including allowed values. For the
GEOGRAPHY
type, see Working with BigQuery GIS data. -
types (Array, Hash) (defaults to: nil) — Standard SQL only. Types of the SQL parameters in
params
. It is not always possible to infer the right SQL type from a value inparams
. In these cases,types
must be used to specify the SQL type for these values.Arguments must match the value type passed to
params
. This must be anArray
when the query uses positional query parameters. This must be anHash
when the query uses named query parameters. The values should be BigQuery type codes from the following list::BOOL
:INT64
:FLOAT64
:NUMERIC
:BIGNUMERIC
:STRING
:DATETIME
:DATE
:GEOGRAPHY
:TIMESTAMP
:TIME
:BYTES
Array
- Lists are specified by providing the type code in an array. For example, an array of integers are specified as[:INT64]
.Hash
- Types for STRUCT values (Hash
objects) are specified using aHash
object, where the keys match theparams
hash, and the values are the types value that matches the data.
Types are optional.
- external (Hash<String|Symbol, External::DataSource>) (defaults to: nil) — A Hash that represents the mapping of the external tables to the table names used in the SQL query. The hash keys are the table names, and the hash values are the external table objects. See #query.
- max (Integer) (defaults to: nil) — The maximum number of rows of data to return per page of results. Setting this flag to a small value such as 1000 and then paging through results might improve reliability when the query result set is large. In addition to this limit, responses are also limited to 10 MB. By default, there is no maximum row count, and only the byte limit applies.
- cache (Boolean) (defaults to: true) — Whether to look for the result in the query cache. The query cache is a best-effort cache that will be flushed whenever tables in the query are modified. The default value is true. For more information, see query caching.
-
standard_sql (Boolean) (defaults to: nil) — Specifies whether to use BigQuery's
standard
SQL
dialect for this query. If set to true, the query will use standard
SQL rather than the legacy
SQL
dialect. When set to true, the values of
large_results
andflatten
are ignored; the query will be run as iflarge_results
is true andflatten
is false. Optional. The default value is true. -
legacy_sql (Boolean) (defaults to: nil) — Specifies whether to use BigQuery's
legacy
SQL
dialect for this query. If set to false, the query will use
BigQuery's standard
SQL
When set to false, the values of
large_results
andflatten
are ignored; the query will be run as iflarge_results
is true andflatten
is false. Optional. The default value is false. -
session_id (String) (defaults to: nil) — The ID of an existing session. See the
create_session
param in #query_job and Job#session_id.
- (job) — a job configuration object
- job (Google::Cloud::Bigquery::QueryJob::Updater) — a job configuration object for setting additional options for the query.
- (Google::Cloud::Bigquery::Data) — A new data object.
Query using standard SQL:
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" data = dataset.query "SELECT name FROM my_table" # Iterate over the first page of results data.each do |row| puts row[:name] end # Retrieve the next page of results data = data.next if data.next?
Query using legacy SQL:
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" data = dataset.query "SELECT name FROM my_table", legacy_sql: true # Iterate over the first page of results data.each do |row| puts row[:name] end # Retrieve the next page of results data = data.next if data.next?
Query using positional query parameters:
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" data = dataset.query "SELECT name FROM my_table WHERE id = ?", params: [1] # Iterate over the first page of results data.each do |row| puts row[:name] end # Retrieve the next page of results data = data.next if data.next?
Query using named query parameters:
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" data = dataset.query "SELECT name FROM my_table WHERE id = @id", params: { id: 1 } # Iterate over the first page of results data.each do |row| puts row[:name] end # Retrieve the next page of results data = data.next if data.next?
Query using named query parameters with types:
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" data = dataset.query "SELECT name FROM my_table WHERE id IN UNNEST(@ids)", params: { ids: [] }, types: { ids: [:INT64] } # Iterate over the first page of results data.each do |row| puts row[:name] end # Retrieve the next page of results data = data.next if data.next?
Execute a DDL statement:
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" data = dataset.query "CREATE TABLE my_table (x INT64)" table_ref = data.ddl_target_table # Or ddl_target_routine for CREATE/DROP FUNCTION/PROCEDURE
Execute a DML statement:
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" data = dataset.query "UPDATE my_table SET x = x + 1 WHERE x IS NOT NULL" puts data.num_dml_affected_rows
Run query in a session:
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" job = dataset.query_job "CREATE TEMPORARY TABLE temptable AS SELECT 17 as foo", create_session: true job.wait_until_done! session_id = job.session_id data = dataset.query "SELECT * FROM temptable", session_id: session_id
Query using external data source, set destination:
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" csv_url = "gs://bucket/path/to/data.csv" csv_table = dataset.external csv_url do |csv| csv.autodetect = true csv.skip_leading_rows = 1 end data = dataset.query "SELECT * FROM my_ext_table" do |query| query.external = { my_ext_table: csv_table } query.table = dataset.table "my_table", skip_lookup: true end # Iterate over the first page of results data.each do |row| puts row[:name] end # Retrieve the next page of results data = data.next if data.next?
#query_job
def query_job(query, params: nil, types: nil, external: nil, priority: "INTERACTIVE", cache: true, table: nil, create: nil, write: nil, dryrun: nil, standard_sql: nil, legacy_sql: nil, large_results: nil, flatten: nil, maximum_billing_tier: nil, maximum_bytes_billed: nil, job_id: nil, prefix: nil, labels: nil, udfs: nil, create_session: nil, session_id: nil) { |job| ... } -> Google::Cloud::Bigquery::QueryJob
Queries data by creating a query job.
Sets the current dataset as the default dataset in the query. Useful for using unqualified table names.
The geographic location for the job ("US", "EU", etc.) can be set via QueryJob::Updater#location= in a block passed to this method. If the dataset is a full resource representation (see #resource_full?), the location of the job will be automatically set to the location of the dataset.
- query (String) — A query string, following the BigQuery query syntax, of the query to execute. Example: "SELECT count(f1) FROM [myProjectId:myDatasetId.myTableId]".
-
params (Array, Hash) (defaults to: nil) — Standard SQL only. Used to pass query arguments when the
query
string contains either positional (?
) or named (@myparam
) query parameters. If value passed is an array["foo"]
, the query must use positional query parameters. If value passed is a hash{ myparam: "foo" }
, the query must use named query parameters. When set,legacy_sql
will automatically be set to false andstandard_sql
to true.BigQuery types are converted from Ruby types as follows:
| BigQuery | Ruby | Notes | |--------------|--------------------------------------|----------------------------------------------------| |
BOOL
|true
/false
| | |INT64
|Integer
| | |FLOAT64
|Float
| | |NUMERIC
|BigDecimal
|BigDecimal
values will be rounded to scale 9. | |BIGNUMERIC
|BigDecimal
| NOT AUTOMATIC: Must be mapped usingtypes
, below.| |STRING
|String
| | |DATETIME
|DateTime
|DATETIME
does not support time zone. | |DATE
|Date
| | |GEOGRAPHY
|String
(WKT or GeoJSON) | NOT AUTOMATIC: Must be mapped usingtypes
, below.| |TIMESTAMP
|Time
| | |TIME
|Google::Cloud::BigQuery::Time
| | |BYTES
|File
,IO
,StringIO
, or similar | | |ARRAY
|Array
| Nested arrays,nil
values are not supported. | |STRUCT
|Hash
| Hash keys may be strings or symbols. |See Data Types for an overview of each BigQuery data type, including allowed values. For the
GEOGRAPHY
type, see Working with BigQuery GIS data. -
types (Array, Hash) (defaults to: nil) — Standard SQL only. Types of the SQL parameters in
params
. It is not always possible to infer the right SQL type from a value inparams
. In these cases,types
must be used to specify the SQL type for these values.Arguments must match the value type passed to
params
. This must be anArray
when the query uses positional query parameters. This must be anHash
when the query uses named query parameters. The values should be BigQuery type codes from the following list::BOOL
:INT64
:FLOAT64
:NUMERIC
:BIGNUMERIC
:STRING
:DATETIME
:DATE
:GEOGRAPHY
:TIMESTAMP
:TIME
:BYTES
Array
- Lists are specified by providing the type code in an array. For example, an array of integers are specified as[:INT64]
.Hash
- Types for STRUCT values (Hash
objects) are specified using aHash
object, where the keys match theparams
hash, and the values are the types value that matches the data.
Types are optional.
- external (Hash<String|Symbol, External::DataSource>) (defaults to: nil) — A Hash that represents the mapping of the external tables to the table names used in the SQL query. The hash keys are the table names, and the hash values are the external table objects. See #query.
-
priority (String) (defaults to: "INTERACTIVE") — Specifies a priority for the query. Possible
values include
INTERACTIVE
andBATCH
. The default value isINTERACTIVE
. - cache (Boolean) (defaults to: true) — Whether to look for the result in the query cache. The query cache is a best-effort cache that will be flushed whenever tables in the query are modified. The default value is true. For more information, see query caching.
- table (Table) (defaults to: nil) — The destination table where the query results should be stored. If not present, a new table will be created to store the results.
-
create (String) (defaults to: nil) —
Specifies whether the job is allowed to create new tables. The default value is
needed
.The following values are supported:
needed
- Create the table if it does not exist.never
- The table must already exist. A 'notFound' error is raised if the table does not exist.
-
write (String) (defaults to: nil) —
Specifies the action that occurs if the destination table already exists. The default value is
empty
.The following values are supported:
truncate
- BigQuery overwrites the table data.append
- BigQuery appends the data to the table.empty
- A 'duplicate' error is returned in the job result if the table exists and contains data.
- dryrun (Boolean) (defaults to: nil) — If set to true, BigQuery doesn't run the job. Instead, if the query is valid, BigQuery returns statistics about the job such as how many bytes would be processed. If the query is invalid, an error returns. The default value is false.
- standard_sql (Boolean) (defaults to: nil) — Specifies whether to use BigQuery's standard SQL dialect for this query. If set to true, the query will use standard SQL rather than the legacy SQL dialect. Optional. The default value is true.
- legacy_sql (Boolean) (defaults to: nil) — Specifies whether to use BigQuery's legacy SQL dialect for this query. If set to false, the query will use BigQuery's standard SQL dialect. Optional. The default value is false.
-
large_results (Boolean) (defaults to: nil) — This option is specific to Legacy SQL.
If
true
, allows the query to produce arbitrarily large result tables at a slight cost in performance. Requirestable
parameter to be set. -
flatten (Boolean) (defaults to: nil) — This option is specific to Legacy SQL.
Flattens all nested and repeated fields in the query results. The
default value is
true
.large_results
parameter must betrue
if this is set tofalse
. - maximum_billing_tier (Integer) (defaults to: nil) — Limits the billing tier for this job. Queries that have resource usage beyond this tier will fail (without incurring a charge). WARNING: The billed byte amount can be multiplied by an amount up to this number! Most users should not need to alter this setting, and we recommend that you avoid introducing new uses of it. Deprecated.
- maximum_bytes_billed (Integer) (defaults to: nil) — Limits the bytes billed for this job. Queries that will have bytes billed beyond this limit will fail (without incurring a charge). Optional. If unspecified, this will be set to your project default.
-
job_id (String) (defaults to: nil) — A user-defined ID for the query job. The ID
must contain only letters (
[A-Za-z]
), numbers ([0-9]
), underscores (_
), or dashes (-
). The maximum length is 1,024 characters. Ifjob_id
is provided, thenprefix
will not be used.See Generating a job ID.
-
prefix (String) (defaults to: nil) — A string, usually human-readable, that will be
prepended to a generated value to produce a unique job ID. For
example, the prefix
daily_import_job_
can be given to generate a job ID such asdaily_import_job_12vEDtMQ0mbp1Mo5Z7mzAFQJZazh
. The prefix must contain only letters ([A-Za-z]
), numbers ([0-9]
), underscores (_
), or dashes (-
). The maximum length of the entire ID is 1,024 characters. Ifjob_id
is provided, thenprefix
will not be used. -
labels (Hash) (defaults to: nil) —
A hash of user-provided labels associated with the job. You can use these to organize and group your jobs.
The labels applied to a resource must meet the following requirements:
- Each resource can have multiple labels, up to a maximum of 64.
- Each label must be a key-value pair.
- Keys have a minimum length of 1 character and a maximum length of 63 characters, and cannot be empty. Values can be empty, and have a maximum length of 63 characters.
- Keys and values can contain only lowercase letters, numeric characters, underscores, and dashes. All characters must use UTF-8 encoding, and international characters are allowed.
- The key portion of a label must be unique. However, you can use the same key with multiple resources.
- Keys must start with a lowercase letter or international character.
-
udfs (Array<String>, String) (defaults to: nil) — User-defined function resources
used in a legacy SQL query. May be either a code resource to load from
a Google Cloud Storage URI (
gs://bucket/path
), or an inline resource that contains code for a user-defined function (UDF). Providing an inline code resource is equivalent to providing a URI for a file containing the same code.This parameter is used for defining User Defined Function (UDF) resources only when using legacy SQL. Users of standard SQL should leverage either DDL (e.g.
CREATE [TEMPORARY] FUNCTION ...
) or the Routines API to define UDF resources.For additional information on migrating, see: Migrating to standard SQL - Differences in user-defined JavaScript functions
-
create_session (Boolean) (defaults to: nil) — If true, creates a new session, where the
session ID will be a server generated random id. If false, runs query
with an existing session ID when one is provided in the
session_id
param, otherwise runs query in non-session mode. See Job#session_id. The default value is false. -
session_id (String) (defaults to: nil) — The ID of an existing session. See also the
create_session
param and Job#session_id.
- (job) — a job configuration object
- job (Google::Cloud::Bigquery::QueryJob::Updater) — a job configuration object for setting additional options for the query.
- (Google::Cloud::Bigquery::QueryJob) — A new query job object.
Query using standard SQL:
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" job = dataset.query_job "SELECT name FROM my_table" job.wait_until_done! if !job.failed? job.data.each do |row| puts row[:name] end end
Query using legacy SQL:
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" job = dataset.query_job "SELECT name FROM my_table", legacy_sql: true job.wait_until_done! if !job.failed? job.data.each do |row| puts row[:name] end end
Query using positional query parameters:
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" job = dataset.query_job "SELECT name FROM my_table WHERE id = ?", params: [1] job.wait_until_done! if !job.failed? job.data.each do |row| puts row[:name] end end
Query using named query parameters:
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" job = dataset.query_job "SELECT name FROM my_table WHERE id = @id", params: { id: 1 } job.wait_until_done! if !job.failed? job.data.each do |row| puts row[:name] end end
Query using named query parameters with types:
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" job = dataset.query_job "SELECT name FROM my_table WHERE id IN UNNEST(@ids)", params: { ids: [] }, types: { ids: [:INT64] } job.wait_until_done! if !job.failed? job.data.each do |row| puts row[:name] end end
Execute a DDL statement:
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" job = dataset.query_job "CREATE TABLE my_table (x INT64)" job.wait_until_done! if !job.failed? table_ref = job.ddl_target_table # Or ddl_target_routine for CREATE/DROP FUNCTION/PROCEDURE end
Execute a DML statement:
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" job = dataset.query_job "UPDATE my_table SET x = x + 1 WHERE x IS NOT NULL" job.wait_until_done! if !job.failed? puts job.num_dml_affected_rows end
Run query in a session:
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" job = dataset.query_job "CREATE TEMPORARY TABLE temptable AS SELECT 17 as foo", create_session: true job.wait_until_done! session_id = job.session_id data = dataset.query "SELECT * FROM temptable", session_id: session_id
Query using external data source, set destination:
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" csv_url = "gs://bucket/path/to/data.csv" csv_table = dataset.external csv_url do |csv| csv.autodetect = true csv.skip_leading_rows = 1 end job = dataset.query_job "SELECT * FROM my_ext_table" do |query| query.external = { my_ext_table: csv_table } query.table = dataset.table "my_table", skip_lookup: true end job.wait_until_done! if !job.failed? job.data.each do |row| puts row[:name] end end
#reference?
def reference?() -> Boolean
Whether the dataset was created without retrieving the resource representation from the BigQuery service.
-
(Boolean) —
true
when the dataset is just a local reference object,false
otherwise.
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset", skip_lookup: true dataset.reference? # true dataset.reload! dataset.reference? # false
#refresh!
def refresh!() -> Google::Cloud::Bigquery::Dataset
Reloads the dataset with current data from the BigQuery service.
- (Google::Cloud::Bigquery::Dataset) — Returns the reloaded dataset.
Skip retrieving the dataset from the service, then load it:
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset", skip_lookup: true dataset.reload!
#reload!
def reload!() -> Google::Cloud::Bigquery::Dataset
Reloads the dataset with current data from the BigQuery service.
- (Google::Cloud::Bigquery::Dataset) — Returns the reloaded dataset.
Skip retrieving the dataset from the service, then load it:
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset", skip_lookup: true dataset.reload!
#resource?
def resource?() -> Boolean
Whether the dataset was created with a resource representation from the BigQuery service.
-
(Boolean) —
true
when the dataset was created with a resource representation,false
otherwise.
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset", skip_lookup: true dataset.resource? # false dataset.reload! dataset.resource? # true
#resource_full?
def resource_full?() -> Boolean
Whether the dataset was created with a full resource representation from the BigQuery service.
-
(Boolean) —
true
when the dataset was created with a full resource representation,false
otherwise.
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" dataset.resource_full? # true
#resource_partial?
def resource_partial?() -> Boolean
Whether the dataset was created with a partial resource representation from the BigQuery service by retrieval through Project#datasets. See Datasets: list response for the contents of the partial representation. Accessing any attribute outside of the partial representation will result in loading the full representation.
-
(Boolean) —
true
when the dataset was created with a partial resource representation,false
otherwise.
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.datasets.first dataset.resource_partial? # true dataset.description # Loads the full resource. dataset.resource_partial? # false
#routine
def routine(routine_id, skip_lookup: nil) -> Google::Cloud::Bigquery::Routine, nil
Retrieves an existing routine by ID.
- routine_id (String) — The ID of a routine.
-
skip_lookup (Boolean) (defaults to: nil) — Optionally create just a local reference
object without verifying that the resource exists on the BigQuery
service. Calls made on this object will raise errors if the resource
does not exist. Default is
false
. Optional.
-
(Google::Cloud::Bigquery::Routine, nil) — Returns
nil
if the routine does not exist.
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" routine = dataset.routine "my_routine" puts routine.routine_id
Avoid retrieving the routine resource with skip_lookup
:
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" routine = dataset.routine "my_routine", skip_lookup: true
#routines
def routines(token: nil, max: nil, filter: nil) -> Array<Google::Cloud::Bigquery::Routine>
Retrieves the list of routines belonging to the dataset.
- token (String) (defaults to: nil) — A previously-returned page token representing part of the larger set of results to view.
- max (Integer) (defaults to: nil) — Maximum number of routines to return.
-
filter (String) (defaults to: nil) — If set, then only the routines matching this filter are returned. The current supported
form is
routineType:
, with a Routine#routine_type enum value. Example:routineType:SCALAR_FUNCTION
.
- (Array<Google::Cloud::Bigquery::Routine>) — An array of routines (See Routine::List)
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" routines = dataset.routines routines.each do |routine| puts routine.routine_id end
Retrieve all routines: (See Routine::List#all)
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" routines = dataset.routines routines.all do |routine| puts routine.routine_id end
#table
def table(table_id, skip_lookup: nil) -> Google::Cloud::Bigquery::Table, nil
Retrieves an existing table by ID.
- table_id (String) — The ID of a table.
-
skip_lookup (Boolean) (defaults to: nil) — Optionally create just a local reference
object without verifying that the resource exists on the BigQuery
service. Calls made on this object will raise errors if the resource
does not exist. Default is
false
. Optional.
-
(Google::Cloud::Bigquery::Table, nil) — Returns
nil
if the table does not exist.
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" table = dataset.table "my_table" puts table.name
Avoid retrieving the table resource with skip_lookup
:
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" table = dataset.table "my_table", skip_lookup: true
#tables
def tables(token: nil, max: nil) -> Array<Google::Cloud::Bigquery::Table>
Retrieves the list of tables belonging to the dataset.
- token (String) (defaults to: nil) — A previously-returned page token representing part of the larger set of results to view.
- max (Integer) (defaults to: nil) — Maximum number of tables to return.
- (Array<Google::Cloud::Bigquery::Table>) — An array of tables (See Table::List)
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" tables = dataset.tables tables.each do |table| puts table.name end
Retrieve all tables: (See Table::List#all)
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" tables = dataset.tables tables.all do |table| puts table.name end