Reference documentation and code samples for the BigQuery API class Google::Cloud::Bigquery::LoadJob::Updater.
Yielded to a block to accumulate changes for a patch request.
Inherits
Methods
#autodetect=
def autodetect=(val)
Allows BigQuery to autodetect the schema.
-
val (Boolean) — Indicates if BigQuery should automatically
infer the options and schema for CSV and JSON sources. The default
value is
false
.
#bignumeric
def bignumeric(name, description: nil, mode: :nullable, policy_tags: nil, precision: nil, scale: nil, default_value_expression: nil)
Adds a bignumeric number field to the schema. BIGNUMERIC
is a
decimal type with fixed precision and scale. Precision is the
number of digits that the number contains. Scale is how many of
these digits appear after the decimal point. It supports:
Precision: 76.76 (the 77th digit is partial) Scale: 38 Min: -5.7896044618658097711785492504343953926634992332820282019728792003956564819968E+38 Max: 5.7896044618658097711785492504343953926634992332820282019728792003956564819967E+38
This type can represent decimal fractions exactly, and is suitable for financial calculations.
-
name (String) — The field name. The name must contain only
letters (
[A-Za-z]
), numbers ([0-9]
), or underscores (_
), and must start with a letter or underscore. The maximum length is 128 characters. - description (String) (defaults to: nil) — A description of the field.
-
mode (Symbol) (defaults to: :nullable) — The field's mode. The possible values are
:nullable
,:required
, and:repeated
. The default value is:nullable
. -
policy_tags (Array<String>, String) (defaults to: nil) — The policy tag list or
single policy tag for the field. Policy tag identifiers are of
the form
projects/*/locations/*/taxonomies/*/policyTags/*
. At most 1 policy tag is currently allowed. -
precision (Integer) (defaults to: nil) — The precision (maximum number of total
digits) for the field. Acceptable values for precision must be:
1 ≤ (precision - scale) ≤ 38
. Values for scale must be:0 ≤ scale ≤ 38
. If the scale value is set, the precision value must be set as well. -
scale (Integer) (defaults to: nil) — The scale (maximum number of digits in the
fractional part) for the field. Acceptable values for precision
must be:
1 ≤ (precision - scale) ≤ 38
. Values for scale must be:0 ≤ scale ≤ 38
. If the scale value is set, the precision value must be set as well. -
default_value_expression (String) (defaults to: nil) —
The default value of a field using a SQL expression. It can only be set for top level fields (columns). Use a struct or array expression to specify default value for the entire struct or array. The valid SQL expressions are:
- Literals for all data types, including STRUCT and ARRAY.
- The following functions:
CURRENT_TIMESTAMP
CURRENT_TIME
CURRENT_DATE
CURRENT_DATETIME
GENERATE_UUID
RAND
SESSION_USER
ST_GEOPOINT
- Struct or array composed with the above allowed functions, for example: "[CURRENT_DATE(), DATE '2020-01-01'"]
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" job = dataset.load_job "my_table", "gs://abc/file" do |schema| schema.bignumeric "total_cost", mode: :required end
Add field with default value.
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" job = dataset.load_job "my_table", "gs://abc/file" do |schema| schema.bignumeric "total_cost", default_value_expression: "1.0e10" end
#boolean
def boolean(name, description: nil, mode: :nullable, policy_tags: nil, default_value_expression: nil)
-
name (String) — The field name. The name must contain only
letters (
[A-Za-z]
), numbers ([0-9]
), or underscores (_
), and must start with a letter or underscore. The maximum length is 128 characters. - description (String) (defaults to: nil) — A description of the field.
-
mode (Symbol) (defaults to: :nullable) — The field's mode. The possible values are
:nullable
,:required
, and:repeated
. The default value is:nullable
. -
policy_tags (Array<String>, String) (defaults to: nil) — The policy tag list or
single policy tag for the field. Policy tag identifiers are of
the form
projects/*/locations/*/taxonomies/*/policyTags/*
. At most 1 policy tag is currently allowed. -
default_value_expression (String) (defaults to: nil) —
The default value of a field using a SQL expression. It can only be set for top level fields (columns). Use a struct or array expression to specify default value for the entire struct or array. The valid SQL expressions are:
- Literals for all data types, including STRUCT and ARRAY.
- The following functions:
CURRENT_TIMESTAMP
CURRENT_TIME
CURRENT_DATE
CURRENT_DATETIME
GENERATE_UUID
RAND
SESSION_USER
ST_GEOPOINT
- Struct or array composed with the above allowed functions, for example: "[CURRENT_DATE(), DATE '2020-01-01'"]
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" job = dataset.load_job "my_table", "gs://abc/file" do |schema| schema.boolean "active", mode: :required end
Add field with default value.
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" job = dataset.load_job "my_table", "gs://abc/file" do |schema| schema.boolean "active", default_value_expression: "true" end
#bytes
def bytes(name, description: nil, mode: :nullable, policy_tags: nil, max_length: nil, default_value_expression: nil)
-
name (String) — The field name. The name must contain only
letters (
[A-Za-z]
), numbers ([0-9]
), or underscores (_
), and must start with a letter or underscore. The maximum length is 128 characters. - description (String) (defaults to: nil) — A description of the field.
-
mode (Symbol) (defaults to: :nullable) — The field's mode. The possible values are
:nullable
,:required
, and:repeated
. The default value is:nullable
. -
policy_tags (Array<String>, String) (defaults to: nil) — The policy tag list or
single policy tag for the field. Policy tag identifiers are of
the form
projects/*/locations/*/taxonomies/*/policyTags/*
. At most 1 policy tag is currently allowed. - max_length (Integer) (defaults to: nil) — The maximum the maximum number of bytes in the field.
-
default_value_expression (String) (defaults to: nil) —
The default value of a field using a SQL expression. It can only be set for top level fields (columns). Use a struct or array expression to specify default value for the entire struct or array. The valid SQL expressions are:
- Literals for all data types, including STRUCT and ARRAY.
- The following functions:
CURRENT_TIMESTAMP
CURRENT_TIME
CURRENT_DATE
CURRENT_DATETIME
GENERATE_UUID
RAND
SESSION_USER
ST_GEOPOINT
- Struct or array composed with the above allowed functions, for example: "[CURRENT_DATE(), DATE '2020-01-01'"]
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" job = dataset.load_job "my_table", "gs://abc/file" do |schema| schema.bytes "avatar", mode: :required end
Add field with default value.
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" job = dataset.load_job "my_table", "gs://abc/file" do |schema| schema.bytes "avatar", default_value_expression: "b'101'" end
#cancel
def cancel()
#check_for_mutated_schema!
def check_for_mutated_schema!()
Make sure any access changes are saved
#clustering_fields=
def clustering_fields=(fields)
Sets the list of fields on which data should be clustered.
Only top-level, non-repeated, simple-type fields are supported. When you cluster a table using multiple columns, the order of columns you specify is important. The order of the specified columns determines the sort order of the data.
BigQuery supports clustering for both partitioned and non-partitioned tables.
See #clustering_fields, Table#clustering_fields and Table#clustering_fields=.
- fields (Array<String>) — The clustering fields. Only top-level, non-repeated, simple-type fields are supported.
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" gcs_uri = "gs://my-bucket/file-name.csv" load_job = dataset.load_job "my_new_table", gcs_uri do |job| job.time_partitioning_type = "DAY" job.time_partitioning_field = "dob" job.schema do |schema| schema.timestamp "dob", mode: :required schema.string "first_name", mode: :required schema.string "last_name", mode: :required end job.clustering_fields = ["last_name", "first_name"] end load_job.wait_until_done! load_job.done? #=> true
#create=
def create=(new_create)
Sets the create disposition.
This specifies whether the job is allowed to create new tables. The
default value is needed
.
The following values are supported:
needed
- Create the table if it does not exist.never
- The table must already exist. A 'notFound' error is raised if the table does not exist.
- new_create (String) — The new create disposition.
#create_session=
def create_session=(value)
Sets the create_session property. If true, creates a new session,
where session id will be a server generated random id. If false,
runs query with an existing #session_id=, otherwise runs query in
non-session mode. The default value is false
.
value is false
.
- value (Boolean) — The create_session property. The default
#date
def date(name, description: nil, mode: :nullable, policy_tags: nil, default_value_expression: nil)
-
name (String) — The field name. The name must contain only
letters (
[A-Za-z]
), numbers ([0-9]
), or underscores (_
), and must start with a letter or underscore. The maximum length is 128 characters. - description (String) (defaults to: nil) — A description of the field.
-
mode (Symbol) (defaults to: :nullable) — The field's mode. The possible values are
:nullable
,:required
, and:repeated
. The default value is:nullable
. -
policy_tags (Array<String>, String) (defaults to: nil) — The policy tag list or
single policy tag for the field. Policy tag identifiers are of
the form
projects/*/locations/*/taxonomies/*/policyTags/*
. At most 1 policy tag is currently allowed. -
default_value_expression (String) (defaults to: nil) —
The default value of a field using a SQL expression. It can only be set for top level fields (columns). Use a struct or array expression to specify default value for the entire struct or array. The valid SQL expressions are:
- Literals for all data types, including STRUCT and ARRAY.
- The following functions:
CURRENT_TIMESTAMP
CURRENT_TIME
CURRENT_DATE
CURRENT_DATETIME
GENERATE_UUID
RAND
SESSION_USER
ST_GEOPOINT
- Struct or array composed with the above allowed functions, for example: "[CURRENT_DATE(), DATE '2020-01-01'"]
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" job = dataset.load_job "my_table", "gs://abc/file" do |schema| schema.date "birthday", mode: :required end
Add field with default value.
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" job = dataset.load_job "my_table", "gs://abc/file" do |schema| schema.date "birthday", default_value_expression: "CURRENT_DATE" end
#datetime
def datetime(name, description: nil, mode: :nullable, policy_tags: nil, default_value_expression: nil)
-
name (String) — The field name. The name must contain only
letters (
[A-Za-z]
), numbers ([0-9]
), or underscores (_
), and must start with a letter or underscore. The maximum length is 128 characters. - description (String) (defaults to: nil) — A description of the field.
-
mode (Symbol) (defaults to: :nullable) — The field's mode. The possible values are
:nullable
,:required
, and:repeated
. The default value is:nullable
. -
policy_tags (Array<String>, String) (defaults to: nil) — The policy tag list or
single policy tag for the field. Policy tag identifiers are of
the form
projects/*/locations/*/taxonomies/*/policyTags/*
. At most 1 policy tag is currently allowed. -
default_value_expression (String) (defaults to: nil) —
The default value of a field using a SQL expression. It can only be set for top level fields (columns). Use a struct or array expression to specify default value for the entire struct or array. The valid SQL expressions are:
- Literals for all data types, including STRUCT and ARRAY.
- The following functions:
CURRENT_TIMESTAMP
CURRENT_TIME
CURRENT_DATE
CURRENT_DATETIME
GENERATE_UUID
RAND
SESSION_USER
ST_GEOPOINT
- Struct or array composed with the above allowed functions, for example: "[CURRENT_DATE(), DATE '2020-01-01'"]
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" job = dataset.load_job "my_table", "gs://abc/file" do |schema| schema.datetime "target_end", mode: :required end
Add field with default value.
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" job = dataset.load_job "my_table", "gs://abc/file" do |schema| schema.datetime "target_end", default_value_expression: "CURRENT_DATETIME" end
#delimiter=
def delimiter=(val)
Sets the separator for fields in a CSV file.
-
val (String) — Specifices the separator for fields in a CSV
file. BigQuery converts the string to
ISO-8859-1
encoding, and then uses the first byte of the encoded string to split the data in its raw, binary state. Default is,
.
#encoding=
def encoding=(val)
Sets the character encoding of the data.
-
val (String) — The character encoding of the data. The
supported values are
UTF-8
orISO-8859-1
. The default value isUTF-8
.
#encryption=
def encryption=(val)
Sets the encryption configuration of the destination table.
- val (Google::Cloud::BigQuery::EncryptionConfiguration) — Custom encryption configuration (e.g., Cloud KMS keys).
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" key_name = "projects/a/locations/b/keyRings/c/cryptoKeys/d" encrypt_config = bigquery.encryption kms_key: key_name job = dataset.load_job "my_table", "gs://abc/file" do |job| job.encryption = encrypt_config end
#float
def float(name, description: nil, mode: :nullable, policy_tags: nil, default_value_expression: nil)
-
name (String) — The field name. The name must contain only
letters (
[A-Za-z]
), numbers ([0-9]
), or underscores (_
), and must start with a letter or underscore. The maximum length is 128 characters. - description (String) (defaults to: nil) — A description of the field.
-
mode (Symbol) (defaults to: :nullable) — The field's mode. The possible values are
:nullable
,:required
, and:repeated
. The default value is:nullable
. -
policy_tags (Array<String>, String) (defaults to: nil) — The policy tag list or
single policy tag for the field. Policy tag identifiers are of
the form
projects/*/locations/*/taxonomies/*/policyTags/*
. At most 1 policy tag is currently allowed. -
default_value_expression (String) (defaults to: nil) —
The default value of a field using a SQL expression. It can only be set for top level fields (columns). Use a struct or array expression to specify default value for the entire struct or array. The valid SQL expressions are:
- Literals for all data types, including STRUCT and ARRAY.
- The following functions:
CURRENT_TIMESTAMP
CURRENT_TIME
CURRENT_DATE
CURRENT_DATETIME
GENERATE_UUID
RAND
SESSION_USER
ST_GEOPOINT
- Struct or array composed with the above allowed functions, for example: "[CURRENT_DATE(), DATE '2020-01-01'"]
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" job = dataset.load_job "my_table", "gs://abc/file" do |schema| schema.float "price", mode: :required end
Add field with default value.
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" job = dataset.load_job "my_table", "gs://abc/file" do |schema| schema.float "price", default_value_expression: "1.0" end
#format=
def format=(new_format)
Sets the source file format. The default value is csv
.
The following values are supported:
csv
- CSVjson
- Newline-delimited JSONavro
- Avroorc
- ORCparquet
- Parquetdatastore_backup
- Cloud Datastore backup
- new_format (String) — The new source format.
#geography
def geography(name, description: nil, mode: :nullable, policy_tags: nil, default_value_expression: nil)
-
name (String) — The field name. The name must contain only
letters (
[A-Za-z]
), numbers ([0-9]
), or underscores (_
), and must start with a letter or underscore. The maximum length is 128 characters. - description (String) (defaults to: nil) — A description of the field.
-
mode (Symbol) (defaults to: :nullable) — The field's mode. The possible values are
:nullable
,:required
, and:repeated
. The default value is:nullable
. -
policy_tags (Array<String>, String) (defaults to: nil) — The policy tag list or
single policy tag for the field. Policy tag identifiers are of
the form
projects/*/locations/*/taxonomies/*/policyTags/*
. At most 1 policy tag is currently allowed. -
default_value_expression (String) (defaults to: nil) —
The default value of a field using a SQL expression. It can only be set for top level fields (columns). Use a struct or array expression to specify default value for the entire struct or array. The valid SQL expressions are:
- Literals for all data types, including STRUCT and ARRAY.
- The following functions:
CURRENT_TIMESTAMP
CURRENT_TIME
CURRENT_DATE
CURRENT_DATETIME
GENERATE_UUID
RAND
SESSION_USER
ST_GEOPOINT
- Struct or array composed with the above allowed functions, for example: "[CURRENT_DATE(), DATE '2020-01-01'"]
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" job = dataset.load_job "my_table", "gs://abc/file" do |schema| schema.record "cities_lived", mode: :repeated do |cities_lived| cities_lived.geography "location", mode: :required cities_lived.integer "number_of_years", mode: :required end end
Add field with default value.
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" job = dataset.load_job "my_table", "gs://abc/file" do |schema| schema.geography "location", default_value_expression: "ST_GEOGPOINT(-122.084801, 37.422131)" end
#hive_partitioning_mode=
def hive_partitioning_mode=(mode)
Sets the mode of hive partitioning to use when reading data. The following modes are supported:
auto
: automatically infer partition key name(s) and type(s).strings
: automatically infer partition key name(s). All types are interpreted as strings.custom
: partition key schema is encoded in the source URI prefix.
Not all storage formats support hive partitioning. Requesting hive partitioning on an unsupported format
will lead to an error. Currently supported types include: avro
, csv
, json
, orc
and parquet
.
- mode (String, Symbol) — The mode of hive partitioning to use when reading data.
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" gcs_uri = "gs://cloud-samples-data/bigquery/hive-partitioning-samples/autolayout/*" source_uri_prefix = "gs://cloud-samples-data/bigquery/hive-partitioning-samples/autolayout/" load_job = dataset.load_job "my_new_table", gcs_uri do |job| job.format = :parquet job.hive_partitioning_mode = :auto job.hive_partitioning_source_uri_prefix = source_uri_prefix end load_job.wait_until_done! load_job.done? #=> true
#hive_partitioning_source_uri_prefix=
def hive_partitioning_source_uri_prefix=(source_uri_prefix)
Sets the common prefix for all source uris when hive partition detection is requested. The prefix must end immediately before the partition key encoding begins. For example, consider files following this data layout:
gs://bucket/path_to_table/dt=2019-01-01/country=BR/id=7/file.avro
gs://bucket/path_to_table/dt=2018-12-31/country=CA/id=3/file.avro
When hive partitioning is requested with either AUTO
or STRINGS
mode, the common prefix can be either of
gs://bucket/path_to_table
or gs://bucket/path_to_table/
(trailing slash does not matter).
- source_uri_prefix (String) — The common prefix for all source uris.
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" gcs_uri = "gs://cloud-samples-data/bigquery/hive-partitioning-samples/autolayout/*" source_uri_prefix = "gs://cloud-samples-data/bigquery/hive-partitioning-samples/autolayout/" load_job = dataset.load_job "my_new_table", gcs_uri do |job| job.format = :parquet job.hive_partitioning_mode = :auto job.hive_partitioning_source_uri_prefix = source_uri_prefix end load_job.wait_until_done! load_job.done? #=> true
#ignore_unknown=
def ignore_unknown=(val)
Allows unknown columns to be ignored.
-
val (Boolean) —
Indicates if BigQuery should allow extra values that are not represented in the table schema. If true, the extra values are ignored. If false, records with extra columns are treated as bad records, and if there are too many bad records, an invalid error is returned in the job result. The default value is
false
.The
format
property determines what BigQuery treats as an extra value:CSV
: Trailing columnsJSON
: Named values that don't match any column names
#integer
def integer(name, description: nil, mode: :nullable, policy_tags: nil, default_value_expression: nil)
-
name (String) — The field name. The name must contain only
letters (
[A-Za-z]
), numbers ([0-9]
), or underscores (_
), and must start with a letter or underscore. The maximum length is 128 characters. - description (String) (defaults to: nil) — A description of the field.
-
mode (Symbol) (defaults to: :nullable) — The field's mode. The possible values are
:nullable
,:required
, and:repeated
. The default value is:nullable
. -
policy_tags (Array<String>, String) (defaults to: nil) — The policy tag list or
single policy tag for the field. Policy tag identifiers are of
the form
projects/*/locations/*/taxonomies/*/policyTags/*
. At most 1 policy tag is currently allowed. -
default_value_expression (String) (defaults to: nil) —
The default value of a field using a SQL expression. It can only be set for top level fields (columns). Use a struct or array expression to specify default value for the entire struct or array. The valid SQL expressions are:
- Literals for all data types, including STRUCT and ARRAY.
- The following functions:
CURRENT_TIMESTAMP
CURRENT_TIME
CURRENT_DATE
CURRENT_DATETIME
GENERATE_UUID
RAND
SESSION_USER
ST_GEOPOINT
- Struct or array composed with the above allowed functions, for example: "[CURRENT_DATE(), DATE '2020-01-01'"]
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" job = dataset.load_job "my_table", "gs://abc/file" do |schema| schema.integer "age", mode: :required end
Add field with default value.
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" job = dataset.load_job "my_table", "gs://abc/file" do |schema| schema.integer "age", default_value_expression: "1" end
#jagged_rows=
def jagged_rows=(val)
Sets flag for allowing jagged rows.
Accept rows that are missing trailing optional columns. The missing
values are treated as nulls. If false
, records with missing
trailing columns are treated as bad records, and if there are too
many bad records, an invalid error is returned in the job result.
The default value is false
. Only applicable to CSV, ignored for
other formats.
- val (Boolean) — Accept rows that are missing trailing optional columns.
#labels=
def labels=(val)
Sets the labels to use for the load job.
-
val (Hash) —
A hash of user-provided labels associated with the job. You can use these to organize and group your jobs.
The labels applied to a resource must meet the following requirements:
- Each resource can have multiple labels, up to a maximum of 64.
- Each label must be a key-value pair.
- Keys have a minimum length of 1 character and a maximum length of 63 characters, and cannot be empty. Values can be empty, and have a maximum length of 63 characters.
- Keys and values can contain only lowercase letters, numeric characters, underscores, and dashes. All characters must use UTF-8 encoding, and international characters are allowed.
- The key portion of a label must be unique. However, you can use the same key with multiple resources.
- Keys must start with a lowercase letter or international character.
#location=
def location=(value)
Sets the geographic location where the job should run. Required except for US and EU.
- value (String) — A geographic location, such as "US", "EU" or "asia-northeast1". Required except for US and EU.
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" job = dataset.load_job "my_table", "gs://abc/file" do |j| j.schema do |s| s.string "first_name", mode: :required s.record "cities_lived", mode: :repeated do |r| r.string "place", mode: :required r.integer "number_of_years", mode: :required end end j.location = "EU" end
#max_bad_records=
def max_bad_records=(val)
Sets the maximum number of bad records that can be ignored.
-
val (Integer) — The maximum number of bad records that
BigQuery can ignore when running the job. If the number of bad
records exceeds this value, an invalid error is returned in the
job result. The default value is
0
, which requires that all records are valid.
#null_marker=
def null_marker=(val)
Sets the string that represents a null value in a CSV file.
-
val (String) — Specifies a string that represents a null value
in a CSV file. For example, if you specify
\N
, BigQuery interprets\N
as a null value when loading a CSV file. The default value is the empty string. If you set this property to a custom value, BigQuery throws an error if an empty string is present for all data types except for STRING and BYTE. For STRING and BYTE columns, BigQuery interprets the empty string as an empty value.
#numeric
def numeric(name, description: nil, mode: :nullable, policy_tags: nil, precision: nil, scale: nil, default_value_expression: nil)
Adds a numeric number field to the schema. NUMERIC
is a decimal
type with fixed precision and scale. Precision is the number of
digits that the number contains. Scale is how many of these
digits appear after the decimal point. It supports:
Precision: 38 Scale: 9 Min: -9.9999999999999999999999999999999999999E+28 Max: 9.9999999999999999999999999999999999999E+28
This type can represent decimal fractions exactly, and is suitable for financial calculations.
See Schema#numeric
-
name (String) — The field name. The name must contain only
letters (
[A-Za-z]
), numbers ([0-9]
), or underscores (_
), and must start with a letter or underscore. The maximum length is 128 characters. - description (String) (defaults to: nil) — A description of the field.
-
mode (Symbol) (defaults to: :nullable) — The field's mode. The possible values are
:nullable
,:required
, and:repeated
. The default value is:nullable
. -
policy_tags (Array<String>, String) (defaults to: nil) — The policy tag list or
single policy tag for the field. Policy tag identifiers are of
the form
projects/*/locations/*/taxonomies/*/policyTags/*
. At most 1 policy tag is currently allowed. -
precision (Integer) (defaults to: nil) — The precision (maximum number of total
digits) for the field. Acceptable values for precision must be:
1 ≤ (precision - scale) ≤ 29
. Values for scale must be:0 ≤ scale ≤ 9
. If the scale value is set, the precision value must be set as well. -
scale (Integer) (defaults to: nil) — The scale (maximum number of digits in the
fractional part) for the field. Acceptable values for precision
must be:
1 ≤ (precision - scale) ≤ 29
. Values for scale must be:0 ≤ scale ≤ 9
. If the scale value is set, the precision value must be set as well. -
default_value_expression (String) (defaults to: nil) —
The default value of a field using a SQL expression. It can only be set for top level fields (columns). Use a struct or array expression to specify default value for the entire struct or array. The valid SQL expressions are:
- Literals for all data types, including STRUCT and ARRAY.
- The following functions:
CURRENT_TIMESTAMP
CURRENT_TIME
CURRENT_DATE
CURRENT_DATETIME
GENERATE_UUID
RAND
SESSION_USER
ST_GEOPOINT
- Struct or array composed with the above allowed functions, for example: "[CURRENT_DATE(), DATE '2020-01-01'"]
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" job = dataset.load_job "my_table", "gs://abc/file" do |schema| schema.numeric "total_cost", mode: :required end
Add field with default value.
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" job = dataset.load_job "my_table", "gs://abc/file" do |schema| schema.numeric "total_cost", default_value_expression: "1.0e10" end
#parquet_enable_list_inference=
def parquet_enable_list_inference=(enable_list_inference)
Sets whether to use schema inference specifically for Parquet LIST
logical type.
-
enable_list_inference (Boolean) — The
enable_list_inference
value to use in Parquet options.
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" gcs_uris = ["gs://mybucket/00/*.parquet", "gs://mybucket/01/*.parquet"] load_job = dataset.load_job "my_new_table", gcs_uris do |job| job.format = :parquet job.parquet_enable_list_inference = true end load_job.wait_until_done! load_job.done? #=> true
#parquet_enum_as_string=
def parquet_enum_as_string=(enum_as_string)
Sets whether to infer Parquet ENUM
logical type as STRING
instead of BYTES
by default.
-
enum_as_string (Boolean) — The
enum_as_string
value to use in Parquet options.
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" gcs_uris = ["gs://mybucket/00/*.parquet", "gs://mybucket/01/*.parquet"] load_job = dataset.load_job "my_new_table", gcs_uris do |job| job.format = :parquet job.parquet_enum_as_string = true end load_job.wait_until_done! load_job.done? #=> true
#projection_fields=
def projection_fields=(new_fields)
Sets the projection fields.
If the format
option is set to datastore_backup
, indicates
which entity properties to load from a Cloud Datastore backup.
Property names are case sensitive and must be top-level properties.
If not set, BigQuery loads all properties. If any named property
isn't found in the Cloud Datastore backup, an invalid error is
returned.
- new_fields (Array<String>) — The new projection fields.
#quote=
def quote=(val)
Sets the character to use to quote string values in CSVs.
-
val (String) — The value that is used to quote data sections
in a CSV file. BigQuery converts the string to ISO-8859-1
encoding, and then uses the first byte of the encoded string to
split the data in its raw, binary state. The default value is a
double-quote
"
. If your data does not contain quoted sections, set the property value to an empty string. If your data contains quoted newline characters, you must also set the allowQuotedNewlines property to true.
#quoted_newlines=
def quoted_newlines=(val)
Allows quoted data sections to contain newline characters in CSV.
-
val (Boolean) — Indicates if BigQuery should allow quoted data
sections that contain newline characters in a CSV file. The
default value is
false
.
#range_partitioning_end=
def range_partitioning_end=(range_end)
Sets the end of range partitioning, exclusive, for the destination table. See Creating and using integer range partitioned tables.
You can only set range partitioning when creating a table. BigQuery does not allow you to change partitioning on an existing table.
See #range_partitioning_start=, #range_partitioning_interval= and #range_partitioning_field=.
- range_end (Integer) — The end of range partitioning, exclusive.
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" gcs_uri = "gs://my-bucket/file-name.csv" load_job = dataset.load_job "my_new_table", gcs_uri do |job| job.schema do |schema| schema.integer "my_table_id", mode: :required schema.string "my_table_data", mode: :required end job.range_partitioning_field = "my_table_id" job.range_partitioning_start = 0 job.range_partitioning_interval = 10 job.range_partitioning_end = 100 end load_job.wait_until_done! load_job.done? #=> true
#range_partitioning_field=
def range_partitioning_field=(field)
Sets the field on which to range partition the table. See Creating and using integer range partitioned tables.
See #range_partitioning_start=, #range_partitioning_interval= and #range_partitioning_end=.
You can only set range partitioning when creating a table. BigQuery does not allow you to change partitioning on an existing table.
-
field (String) — The range partition field. the destination table is partitioned by this
field. The field must be a top-level
NULLABLE/REQUIRED
field. The only supported type isINTEGER/INT64
.
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" gcs_uri = "gs://my-bucket/file-name.csv" load_job = dataset.load_job "my_new_table", gcs_uri do |job| job.schema do |schema| schema.integer "my_table_id", mode: :required schema.string "my_table_data", mode: :required end job.range_partitioning_field = "my_table_id" job.range_partitioning_start = 0 job.range_partitioning_interval = 10 job.range_partitioning_end = 100 end load_job.wait_until_done! load_job.done? #=> true
#range_partitioning_interval=
def range_partitioning_interval=(range_interval)
Sets width of each interval for data in range partitions. See Creating and using integer range partitioned tables.
You can only set range partitioning when creating a table. BigQuery does not allow you to change partitioning on an existing table.
See #range_partitioning_field=, #range_partitioning_start= and #range_partitioning_end=.
- range_interval (Integer) — The width of each interval, for data in partitions.
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" gcs_uri = "gs://my-bucket/file-name.csv" load_job = dataset.load_job "my_new_table", gcs_uri do |job| job.schema do |schema| schema.integer "my_table_id", mode: :required schema.string "my_table_data", mode: :required end job.range_partitioning_field = "my_table_id" job.range_partitioning_start = 0 job.range_partitioning_interval = 10 job.range_partitioning_end = 100 end load_job.wait_until_done! load_job.done? #=> true
#range_partitioning_start=
def range_partitioning_start=(range_start)
Sets the start of range partitioning, inclusive, for the destination table. See Creating and using integer range partitioned tables.
You can only set range partitioning when creating a table. BigQuery does not allow you to change partitioning on an existing table.
See #range_partitioning_field=, #range_partitioning_interval= and #range_partitioning_end=.
- range_start (Integer) — The start of range partitioning, inclusive.
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" gcs_uri = "gs://my-bucket/file-name.csv" load_job = dataset.load_job "my_new_table", gcs_uri do |job| job.schema do |schema| schema.integer "my_table_id", mode: :required schema.string "my_table_data", mode: :required end job.range_partitioning_field = "my_table_id" job.range_partitioning_start = 0 job.range_partitioning_interval = 10 job.range_partitioning_end = 100 end load_job.wait_until_done! load_job.done? #=> true
#record
def record(name, description: nil, mode: nil, default_value_expression: nil, &block) { |nested_schema| ... }
Adds a record field to the schema. A block must be passed describing the nested fields of the record. For more information about nested and repeated records, see Loading denormalized, nested, and repeated data .
See Schema#record.
-
name (String) — The field name. The name must contain only
letters (
[A-Za-z]
), numbers ([0-9]
), or underscores (_
), and must start with a letter or underscore. The maximum length is 128 characters. - description (String) (defaults to: nil) — A description of the field.
-
mode (Symbol) (defaults to: nil) — The field's mode. The possible values are
:nullable
,:required
, and:repeated
. The default value is:nullable
. -
default_value_expression (String) (defaults to: nil) —
The default value of a field using a SQL expression. It can only be set for top level fields (columns). Use a struct or array expression to specify default value for the entire struct or array. The valid SQL expressions are:
- Literals for all data types, including STRUCT and ARRAY.
- The following functions:
CURRENT_TIMESTAMP
CURRENT_TIME
CURRENT_DATE
CURRENT_DATETIME
GENERATE_UUID
RAND
SESSION_USER
ST_GEOPOINT
- Struct or array composed with the above allowed functions, for example: "[CURRENT_DATE(), DATE '2020-01-01'"]
- (nested_schema) — a block for setting the nested schema
- nested_schema (Schema) — the object accepting the nested schema
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" job = dataset.load_job "my_table", "gs://abc/file" do |schema| schema.record "cities_lived", mode: :repeated do |cities_lived| cities_lived.string "place", mode: :required cities_lived.integer "number_of_years", mode: :required end end
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" job = dataset.load_job "my_table", "gs://abc/file" do |schema| schema.record "cities_lived", default_value_expression: "STRUCT('place',10)" do |cities_lived| cities_lived.string "place", mode: :required cities_lived.integer "number_of_years", mode: :required end end
#refresh!
def refresh!()
#reload!
def reload!()
#rerun!
def rerun!()
#schema
def schema(replace: false) { |schema| ... } -> Google::Cloud::Bigquery::Schema
Returns the table's schema. This method can also be used to set, replace, or add to the schema by passing a block. See Schema for available methods.
-
replace (Boolean) (defaults to: false) — Whether to replace the existing schema with
the new schema. If
true
, the fields will replace the existing schema. Iffalse
, the fields will be added to the existing schema. When a table already contains data, schema changes must be additive. Thus, the default value isfalse
.
- (schema) — a block for setting the schema
- schema (Schema) — the object accepting the schema
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" job = dataset.load_job "my_table", "gs://abc/file" do |j| j.schema do |s| s.string "first_name", mode: :required s.record "cities_lived", mode: :repeated do |r| r.string "place", mode: :required r.integer "number_of_years", mode: :required end end end
#schema=
def schema=(new_schema)
Sets the schema of the destination table.
- new_schema (Google::Cloud::Bigquery::Schema) — The schema for the destination table. Optional. The schema can be omitted if the destination table already exists, or if you're loading data from a source that includes a schema, such as Avro or a Google Cloud Datastore backup.
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new schema = bigquery.schema do |s| s.string "first_name", mode: :required s.record "cities_lived", mode: :repeated do |nested_schema| nested_schema.string "place", mode: :required nested_schema.integer "number_of_years", mode: :required end end dataset = bigquery.dataset "my_dataset" job = dataset.load_job "my_table", "gs://abc/file" do |j| j.schema = schema end
#schema_update_options=
def schema_update_options=(new_options)
Sets the schema update options, which allow the schema of the
destination table to be updated as a side effect of the load job if
a schema is autodetected or supplied in the job configuration.
Schema update options are supported in two cases: when write
disposition is WRITE_APPEND
; when write disposition is
WRITE_TRUNCATE
and the destination table is a partition of a
table, specified by partition decorators. For normal tables,
WRITE_TRUNCATE
will always overwrite the schema. One or more of
the following values are specified:
ALLOW_FIELD_ADDITION
: allow adding a nullable field to the schema.ALLOW_FIELD_RELAXATION
: allow relaxing a required field in the original schema to nullable.
- new_options (Array<String>) — The new schema update options.
#session_id=
def session_id=(value)
Sets the session ID for a query run in session mode. See #create_session=.
-
value (String) — The session ID. The default value is
nil
.
#skip_leading=
def skip_leading=(val)
Sets the number of leading rows to skip in the file.
-
val (Integer) — The number of rows at the top of a CSV file
that BigQuery will skip when loading the data. The default
value is
0
. This property is useful if you have header rows in the file that should be skipped.
#source_uris=
def source_uris=(new_uris)
Sets the source URIs to load.
The fully-qualified URIs that point to your data in Google Cloud.
- For Google Cloud Storage URIs: Each URI can contain one '' wildcard character and it must come after the 'bucket' name. Size limits related to load jobs apply to external data sources. For
- Google Cloud Bigtable URIs: Exactly one URI can be specified and it has be a fully specified and valid HTTPS URL for a Google Cloud Bigtable table.
- For Google Cloud Datastore backups: Exactly one URI can be specified. Also, the '' wildcard character is not allowed.
- new_uris (Array<String>) — The new source URIs to load.
#string
def string(name, description: nil, mode: :nullable, policy_tags: nil, max_length: nil, default_value_expression: nil)
-
name (String) — The field name. The name must contain only
letters (
[A-Za-z]
), numbers ([0-9]
), or underscores (_
), and must start with a letter or underscore. The maximum length is 128 characters. - description (String) (defaults to: nil) — A description of the field.
-
mode (Symbol) (defaults to: :nullable) — The field's mode. The possible values are
:nullable
,:required
, and:repeated
. The default value is:nullable
. -
policy_tags (Array<String>, String) (defaults to: nil) — The policy tag list or
single policy tag for the field. Policy tag identifiers are of
the form
projects/*/locations/*/taxonomies/*/policyTags/*
. At most 1 policy tag is currently allowed. - max_length (Integer) (defaults to: nil) — The maximum UTF-8 length of strings allowed in the field.
-
default_value_expression (String) (defaults to: nil) —
The default value of a field using a SQL expression. It can only be set for top level fields (columns). Use a struct or array expression to specify default value for the entire struct or array. The valid SQL expressions are:
- Literals for all data types, including STRUCT and ARRAY.
- The following functions:
CURRENT_TIMESTAMP
CURRENT_TIME
CURRENT_DATE
CURRENT_DATETIME
GENERATE_UUID
RAND
SESSION_USER
ST_GEOPOINT
- Struct or array composed with the above allowed functions, for example: "[CURRENT_DATE(), DATE '2020-01-01'"]
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" job = dataset.load_job "my_table", "gs://abc/file" do |schema| schema.string "first_name", mode: :required end
Add field with default value.
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" job = dataset.load_job "my_table", "gs://abc/file" do |schema| schema.string "first_name", default_value_expression: "'name'" end
#time
def time(name, description: nil, mode: :nullable, policy_tags: nil, default_value_expression: nil)
-
name (String) — The field name. The name must contain only
letters (
[A-Za-z]
), numbers ([0-9]
), or underscores (_
), and must start with a letter or underscore. The maximum length is 128 characters. - description (String) (defaults to: nil) — A description of the field.
-
mode (Symbol) (defaults to: :nullable) — The field's mode. The possible values are
:nullable
,:required
, and:repeated
. The default value is:nullable
. -
policy_tags (Array<String>, String) (defaults to: nil) — The policy tag list or
single policy tag for the field. Policy tag identifiers are of
the form
projects/*/locations/*/taxonomies/*/policyTags/*
. At most 1 policy tag is currently allowed. -
default_value_expression (String) (defaults to: nil) —
The default value of a field using a SQL expression. It can only be set for top level fields (columns). Use a struct or array expression to specify default value for the entire struct or array. The valid SQL expressions are:
- Literals for all data types, including STRUCT and ARRAY.
- The following functions:
CURRENT_TIMESTAMP
CURRENT_TIME
CURRENT_DATE
CURRENT_DATETIME
GENERATE_UUID
RAND
SESSION_USER
ST_GEOPOINT
- Struct or array composed with the above allowed functions, for example: "[CURRENT_DATE(), DATE '2020-01-01'"]
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" job = dataset.load_job "my_table", "gs://abc/file" do |schema| schema.time "duration", mode: :required end
Add field with default value.
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" job = dataset.load_job "my_table", "gs://abc/file" do |schema| schema.time "duration", default_value_expression: "CURRENT_TIME" end
#time_partitioning_expiration=
def time_partitioning_expiration=(expiration)
Sets the time partition expiration for the destination table. See Partitioned Tables.
The destination table must also be time partitioned. See #time_partitioning_type=.
- expiration (Integer) — An expiration time, in seconds, for data in time partitions.
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" gcs_uri = "gs://my-bucket/file-name.csv" load_job = dataset.load_job "my_new_table", gcs_uri do |job| job.time_partitioning_type = "DAY" job.time_partitioning_expiration = 86_400 end load_job.wait_until_done! load_job.done? #=> true
#time_partitioning_field=
def time_partitioning_field=(field)
Sets the field on which to time partition the destination table. If not
set, the destination table is time partitioned by pseudo column
_PARTITIONTIME
; if set, the table is time partitioned by this field.
See Partitioned
Tables.
The destination table must also be time partitioned. See #time_partitioning_type=.
You can only set the time partitioning field while creating a table. BigQuery does not allow you to change partitioning on an existing table.
- field (String) — The time partition field. The field must be a top-level TIMESTAMP or DATE field. Its mode must be NULLABLE or REQUIRED.
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" gcs_uri = "gs://my-bucket/file-name.csv" load_job = dataset.load_job "my_new_table", gcs_uri do |job| job.time_partitioning_type = "DAY" job.time_partitioning_field = "dob" job.schema do |schema| schema.timestamp "dob", mode: :required end end load_job.wait_until_done! load_job.done? #=> true
#time_partitioning_require_filter=
def time_partitioning_require_filter=(val)
If set to true, queries over the destination table will require a time partition filter that can be used for time partition elimination to be specified. See Partitioned Tables.
-
val (Boolean) — Indicates if queries over the destination table
will require a time partition filter. The default value is
false
.
#time_partitioning_type=
def time_partitioning_type=(type)
Sets the time partitioning for the destination table. See Partitioned Tables.
You can only set the time partitioning field while creating a table. BigQuery does not allow you to change partitioning on an existing table.
-
type (String) — The time partition type. The supported types are
DAY
,HOUR
,MONTH
, andYEAR
, which will generate one partition per day, hour, month, and year, respectively.
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" gcs_uri = "gs://my-bucket/file-name.csv" load_job = dataset.load_job "my_new_table", gcs_uri do |job| job.time_partitioning_type = "DAY" end load_job.wait_until_done! load_job.done? #=> true
#timestamp
def timestamp(name, description: nil, mode: :nullable, policy_tags: nil, default_value_expression: nil)
-
name (String) — The field name. The name must contain only
letters (
[A-Za-z]
), numbers ([0-9]
), or underscores (_
), and must start with a letter or underscore. The maximum length is 128 characters. - description (String) (defaults to: nil) — A description of the field.
-
mode (Symbol) (defaults to: :nullable) — The field's mode. The possible values are
:nullable
,:required
, and:repeated
. The default value is:nullable
. -
policy_tags (Array<String>, String) (defaults to: nil) — The policy tag list or
single policy tag for the field. Policy tag identifiers are of
the form
projects/*/locations/*/taxonomies/*/policyTags/*
. At most 1 policy tag is currently allowed. -
default_value_expression (String) (defaults to: nil) —
The default value of a field using a SQL expression. It can only be set for top level fields (columns). Use a struct or array expression to specify default value for the entire struct or array. The valid SQL expressions are:
- Literals for all data types, including STRUCT and ARRAY.
- The following functions:
CURRENT_TIMESTAMP
CURRENT_TIME
CURRENT_DATE
CURRENT_DATETIME
GENERATE_UUID
RAND
SESSION_USER
ST_GEOPOINT
- Struct or array composed with the above allowed functions, for example: "[CURRENT_DATE(), DATE '2020-01-01'"]
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" job = dataset.load_job "my_table", "gs://abc/file" do |schema| schema.timestamp "creation_date", mode: :required end
Add field with default value.
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" job = dataset.load_job "my_table", "gs://abc/file" do |schema| schema.timestamp "creation_date", default_value_expression: "CURRENT_TIMESTAMP" end
#updates
def updates()
A list of attributes that were updated.
#wait_until_done!
def wait_until_done!()
#write=
def write=(new_write)
Sets the write disposition.
This specifies how to handle data already present in the table. The
default value is append
.
The following values are supported:
truncate
- BigQuery overwrites the table data.append
- BigQuery appends the data to the table.empty
- An error will be returned if the table already contains data.
- new_write (String) — The new write disposition.