Reference documentation and code samples for the BigQuery API class Google::Cloud::Bigquery::Project.
Project
Projects are top-level containers in Google Cloud Platform. They store information about billing and authorized users, and they contain BigQuery data. Each project has a friendly name and a unique ID.
Google::Cloud::Bigquery::Project is the main object for interacting with Google BigQuery. Dataset objects are created, accessed, and deleted by Google::Cloud::Bigquery::Project.
Inherits
- Object
Example
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" table = dataset.table "my_table"
Methods
#copy
def copy(source_table, destination_table, create: nil, write: nil, &block) { |job| ... } -> Boolean
Copies the data from the source table to the destination table using a synchronous method that blocks for a response. Timeouts and transient errors are generally handled as needed to complete the job. See #copy_job for the asynchronous version. Use this method instead of Table#copy to copy from source tables in other projects.
The geographic location for the job ("US", "EU", etc.) can be set via CopyJob::Updater#location= in a block passed to this method.
-
source_table (String, Table) — The source table for the
copied data. This can be a table object; or a string ID as specified
by the Standard SQL Query
Reference
(
project-name.dataset_id.table_id
) or the Legacy SQL Query Reference (project-name:dataset_id.table_id
). -
destination_table (String, Table) — The destination table for the
copied data. This can be a table object; or a string ID as specified
by the Standard SQL Query
Reference
(
project-name.dataset_id.table_id
) or the Legacy SQL Query Reference (project-name:dataset_id.table_id
). -
create (String) (defaults to: nil) —
Specifies whether the job is allowed to create new tables. The default value is
needed
.The following values are supported:
needed
- Create the table if it does not exist.never
- The table must already exist. A 'notFound' error is raised if the table does not exist.
-
write (String) (defaults to: nil) —
Specifies how to handle data already present in the destination table. The default value is
empty
.The following values are supported:
truncate
- BigQuery overwrites the table data.append
- BigQuery appends the data to the table.empty
- An error will be returned if the destination table already contains data.
- (job) — a job configuration object
- job (Google::Cloud::Bigquery::CopyJob::Updater) — a job configuration object for setting additional options.
-
(Boolean) — Returns
true
if the copy operation succeeded.
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" destination_table = dataset.table "my_destination_table" bigquery.copy "bigquery-public-data.samples.shakespeare", destination_table
#copy_job
def copy_job(source_table, destination_table, create: nil, write: nil, job_id: nil, prefix: nil, labels: nil) { |job| ... } -> Google::Cloud::Bigquery::CopyJob
Copies the data from the source table to the destination table using an asynchronous method. In this method, a CopyJob is immediately returned. The caller may poll the service by repeatedly calling Job#reload! and Job#done? to detect when the job is done, or simply block until the job is done by calling #Job#wait_until_done!. See #copy for the synchronous version. Use this method instead of Table#copy_job to copy from source tables in other projects.
The geographic location for the job ("US", "EU", etc.) can be set via CopyJob::Updater#location= in a block passed to this method.
-
source_table (String, Table) — The source table for the
copied data. This can be a table object; or a string ID as specified
by the Standard SQL Query
Reference
(
project-name.dataset_id.table_id
) or the Legacy SQL Query Reference (project-name:dataset_id.table_id
). -
destination_table (String, Table) — The destination table for the
copied data. This can be a table object; or a string ID as specified
by the Standard SQL Query
Reference
(
project-name.dataset_id.table_id
) or the Legacy SQL Query Reference (project-name:dataset_id.table_id
). -
create (String) (defaults to: nil) —
Specifies whether the job is allowed to create new tables. The default value is
needed
.The following values are supported:
needed
- Create the table if it does not exist.never
- The table must already exist. A 'notFound' error is raised if the table does not exist.
-
write (String) (defaults to: nil) —
Specifies how to handle data already present in the destination table. The default value is
empty
.The following values are supported:
truncate
- BigQuery overwrites the table data.append
- BigQuery appends the data to the table.empty
- An error will be returned if the destination table already contains data.
-
job_id (String) (defaults to: nil) — A user-defined ID for the copy job. The ID
must contain only letters (
[A-Za-z]
), numbers ([0-9]
), underscores (_
), or dashes (-
). The maximum length is 1,024 characters. Ifjob_id
is provided, thenprefix
will not be used.See Generating a job ID.
-
prefix (String) (defaults to: nil) — A string, usually human-readable, that will be
prepended to a generated value to produce a unique job ID. For
example, the prefix
daily_import_job_
can be given to generate a job ID such asdaily_import_job_12vEDtMQ0mbp1Mo5Z7mzAFQJZazh
. The prefix must contain only letters ([A-Za-z]
), numbers ([0-9]
), underscores (_
), or dashes (-
). The maximum length of the entire ID is 1,024 characters. Ifjob_id
is provided, thenprefix
will not be used. -
labels (Hash) (defaults to: nil) —
A hash of user-provided labels associated with the job. You can use these to organize and group your jobs.
The labels applied to a resource must meet the following requirements:
- Each resource can have multiple labels, up to a maximum of 64.
- Each label must be a key-value pair.
- Keys have a minimum length of 1 character and a maximum length of 63 characters, and cannot be empty. Values can be empty, and have a maximum length of 63 characters.
- Keys and values can contain only lowercase letters, numeric characters, underscores, and dashes. All characters must use UTF-8 encoding, and international characters are allowed.
- The key portion of a label must be unique. However, you can use the same key with multiple resources.
- Keys must start with a lowercase letter or international character.
- (job) — a job configuration object
- job (Google::Cloud::Bigquery::CopyJob::Updater) — a job configuration object for setting additional options.
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" source_table_id = "bigquery-public-data.samples.shakespeare" destination_table = dataset.table "my_destination_table" copy_job = bigquery.copy_job source_table_id, destination_table copy_job.wait_until_done! copy_job.done? #=> true
#create_dataset
def create_dataset(dataset_id, name: nil, description: nil, expiration: nil, location: nil) { |access| ... } -> Google::Cloud::Bigquery::Dataset
Creates a new dataset.
-
dataset_id (String) — A unique ID for this dataset, without the
project name. The ID must contain only letters (
[A-Za-z]
), numbers ([0-9]
), or underscores (_
). The maximum length is 1,024 characters. - name (String) (defaults to: nil) — A descriptive name for the dataset.
- description (String) (defaults to: nil) — A user-friendly description of the dataset.
-
expiration (Integer) (defaults to: nil) — The default lifetime of all tables in the
dataset, in milliseconds. The minimum value is
3_600_000
(one hour). -
location (String) (defaults to: nil) — The geographic location where the dataset
should reside. Possible values include
EU
andUS
. The default value isUS
.
- (access) — a block for setting rules
- access (Google::Cloud::Bigquery::Dataset) — the object accepting rules
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.create_dataset "my_dataset"
A name and description can be provided:
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.create_dataset "my_dataset", name: "My Dataset", description: "This is my Dataset"
Or, configure access with a block: (See Dataset::Access)
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.create_dataset "my_dataset" do |dataset| dataset.access.add_writer_user "writers@example.com" end
#dataset
def dataset(dataset_id, skip_lookup: nil) -> Google::Cloud::Bigquery::Dataset, nil
Retrieves an existing dataset by ID.
- dataset_id (String) — The ID of a dataset.
-
skip_lookup (Boolean) (defaults to: nil) — Optionally create just a local reference
object without verifying that the resource exists on the BigQuery
service. Calls made on this object will raise errors if the resource
does not exist. Default is
false
. Optional.
-
(Google::Cloud::Bigquery::Dataset, nil) — Returns
nil
if the dataset does not exist.
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" puts dataset.name
Avoid retrieving the dataset resource with skip_lookup
:
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset", skip_lookup: true
#datasets
def datasets(all: nil, filter: nil, token: nil, max: nil) -> Array<Google::Cloud::Bigquery::Dataset>
Retrieves the list of datasets belonging to the project.
-
all (Boolean) (defaults to: nil) — Whether to list all datasets, including hidden
ones. The default is
false
. -
filter (String) (defaults to: nil) — An expression for filtering the results of the
request by label. The syntax is
labels.<name>[:<value>]
. Multiple filters can beAND
ed together by connecting with a space. Example:labels.department:receiving labels.active
. See Filtering datasets using labels. - token (String) (defaults to: nil) — A previously-returned page token representing part of the larger set of results to view.
- max (Integer) (defaults to: nil) — Maximum number of datasets to return.
- (Array<Google::Cloud::Bigquery::Dataset>) — (See Dataset::List)
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new datasets = bigquery.datasets datasets.each do |dataset| puts dataset.name end
Retrieve hidden datasets with the all
optional arg:
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new all_datasets = bigquery.datasets all: true
Retrieve all datasets: (See Dataset::List#all)
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new datasets = bigquery.datasets datasets.all do |dataset| puts dataset.name end
#encryption
def encryption(kms_key: nil) -> Google::Cloud::Bigquery::EncryptionConfiguration
Creates a new Bigquery::EncryptionConfiguration instance.
This method does not execute an API call. Use the encryption configuration to encrypt a table when creating one via Bigquery::Dataset#create_table, Bigquery::Dataset#load, Bigquery::Table#copy, or Bigquery::Project#query.
- kms_key (String) (defaults to: nil) — Name of the Cloud KMS encryption key that will be used to protect the destination BigQuery table. The BigQuery Service Account associated with your project requires access to this encryption key.
Encrypt a new table
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" key_name = "projects/a/locations/b/keyRings/c/cryptoKeys/d" encrypt_config = bigquery.encryption kms_key: key_name table = dataset.create_table "my_table" do |updater| updater.encryption = encrypt_config end
Encrypt a load destination table
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" key_name = "projects/a/locations/b/keyRings/c/cryptoKeys/d" encrypt_config = bigquery.encryption kms_key: key_name job = dataset.load_job "my_table", "gs://abc/file" do |job| job.encryption = encrypt_config end
Encrypt a copy destination table
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" table = dataset.table "my_table" key_name = "projects/a/locations/b/keyRings/c/cryptoKeys/d" encrypt_config = bigquery.encryption kms_key: key_name job = table.copy_job "my_dataset.new_table" do |job| job.encryption = encrypt_config end
Encrypt a query destination table
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" key_name = "projects/a/locations/b/keyRings/c/cryptoKeys/d" encrypt_config = bigquery.encryption kms_key: key_name job = bigquery.query_job "SELECT 1;" do |query| query.table = dataset.table "my_table", skip_lookup: true query.encryption = encrypt_config end
#external
def external(url, format: nil) -> External::DataSource
Creates a new External::DataSource (or subclass) object that represents the external data source that can be queried from directly, even though the data is not stored in BigQuery. Instead of loading or streaming the data, this object references the external data source.
- url (String, Array<String>) — The fully-qualified URL(s) that point to your data in Google Cloud. An attempt will be made to derive the format from the URLs provided.
-
format (String|Symbol) (defaults to: nil) —
The data format. This value will be used even if the provided URLs are recognized as a different format. Optional.
The following values are supported:
csv
- CSVjson
- Newline-delimited JSONavro
- Avrosheets
- Google Sheetsdatastore_backup
- Cloud Datastore backupbigtable
- Bigtable
- (ext)
- (External::DataSource) — External data source.
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new csv_url = "gs://bucket/path/to/data.csv" csv_table = bigquery.external csv_url do |csv| csv.autodetect = true csv.skip_leading_rows = 1 end data = bigquery.query "SELECT * FROM my_ext_table", external: { my_ext_table: csv_table } # Iterate over the first page of results data.each do |row| puts row[:name] end # Retrieve the next page of results data = data.next if data.next?
#extract
def extract(source, extract_url, format: nil, compression: nil, delimiter: nil, header: nil, &block) { |job| ... } -> Boolean
Extracts the data from a table or exports a model to Google Cloud Storage using a synchronous method that blocks for a response. Timeouts and transient errors are generally handled as needed to complete the job. See #extract_job for the asynchronous version.
Use this method instead of Table#extract or Model#extract to extract data from source tables or models in other projects.
The geographic location for the job ("US", "EU", etc.) can be set via ExtractJob::Updater#location= in a block passed to this method.
-
source (Table, Model, String) — The source table or model for
the extract operation. This can be a table or model object; or a
table ID string as specified by the Standard SQL Query
Reference
(
project-name.dataset_id.table_id
) or the Legacy SQL Query Reference (project-name:dataset_id.table_id
). - extract_url (Google::Cloud::Storage::File, String, Array<String>) — The Google Storage file or file URI pattern(s) to which BigQuery should extract. For a model export this value should be a string ending in an object name prefix, since multiple objects will be exported.
-
format (String) (defaults to: nil) —
The exported file format. The default value for tables is
csv
. Tables with nested or repeated fields cannot be exported as CSV. The default value for models isml_tf_saved_model
.Supported values for tables:
csv
- CSVjson
- Newline-delimited JSONavro
- Avro
Supported values for models:
ml_tf_saved_model
- TensorFlow SavedModelml_xgboost_booster
- XGBoost Booster
-
compression (String) (defaults to: nil) — The compression type to use for exported
files. Possible values include
GZIP
andNONE
. The default value isNONE
. Not applicable when extracting models. -
delimiter (String) (defaults to: nil) — Delimiter to use between fields in the
exported table data. Default is
,
. Not applicable when extracting models. -
header (Boolean) (defaults to: nil) — Whether to print out a header row in table
exports. Default is
true
. Not applicable when extracting models.
- (job) — a job configuration object
- job (Google::Cloud::Bigquery::ExtractJob::Updater) — a job configuration object for setting additional options.
-
(Boolean) — Returns
true
if the extract operation succeeded.
Export table data
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new bigquery.extract "bigquery-public-data.samples.shakespeare", "gs://my-bucket/shakespeare.csv"
Export a model
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" model = dataset.model "my_model" bigquery.extract model, "gs://my-bucket/#{model.model_id}"
#extract_job
def extract_job(source, extract_url, format: nil, compression: nil, delimiter: nil, header: nil, job_id: nil, prefix: nil, labels: nil) { |job| ... } -> Google::Cloud::Bigquery::ExtractJob
Extracts the data from a table or exports a model to Google Cloud Storage asynchronously, immediately returning an ExtractJob that can be used to track the progress of the export job. The caller may poll the service by repeatedly calling Job#reload! and Job#done? to detect when the job is done, or simply block until the job is done by calling
Job#wait_until_done!. See #extract for the synchronous version.
Use this method instead of Table#extract_job or Model#extract_job to extract data from source tables or models in other projects.
The geographic location for the job ("US", "EU", etc.) can be set via ExtractJob::Updater#location= in a block passed to this method.
-
source (Table, Model, String) — The source table or model for
the extract operation. This can be a table or model object; or a
table ID string as specified by the Standard SQL Query
Reference
(
project-name.dataset_id.table_id
) or the Legacy SQL Query Reference (project-name:dataset_id.table_id
). - extract_url (Google::Cloud::Storage::File, String, Array<String>) — The Google Storage file or file URI pattern(s) to which BigQuery should extract. For a model export this value should be a string ending in an object name prefix, since multiple objects will be exported.
-
format (String) (defaults to: nil) —
The exported file format. The default value for tables is
csv
. Tables with nested or repeated fields cannot be exported as CSV. The default value for models isml_tf_saved_model
.Supported values for tables:
csv
- CSVjson
- Newline-delimited JSONavro
- Avro
Supported values for models:
ml_tf_saved_model
- TensorFlow SavedModelml_xgboost_booster
- XGBoost Booster
-
compression (String) (defaults to: nil) — The compression type to use for exported
files. Possible values include
GZIP
andNONE
. The default value isNONE
. Not applicable when extracting models. -
delimiter (String) (defaults to: nil) — Delimiter to use between fields in the
exported table data. Default is
,
. Not applicable when extracting models. -
header (Boolean) (defaults to: nil) — Whether to print out a header row in table
exports. Default is
true
. Not applicable when extracting models. -
job_id (String) (defaults to: nil) — A user-defined ID for the extract job. The ID
must contain only letters (
[A-Za-z]
), numbers ([0-9]
), underscores (_
), or dashes (-
). The maximum length is 1,024 characters. Ifjob_id
is provided, thenprefix
will not be used.See Generating a job ID.
-
prefix (String) (defaults to: nil) — A string, usually human-readable, that will be
prepended to a generated value to produce a unique job ID. For
example, the prefix
daily_import_job_
can be given to generate a job ID such asdaily_import_job_12vEDtMQ0mbp1Mo5Z7mzAFQJZazh
. The prefix must contain only letters ([A-Za-z]
), numbers ([0-9]
), underscores (_
), or dashes (-
). The maximum length of the entire ID is 1,024 characters. Ifjob_id
is provided, thenprefix
will not be used. -
labels (Hash) (defaults to: nil) —
A hash of user-provided labels associated with the job. You can use these to organize and group your jobs.
The labels applied to a resource must meet the following requirements:
- Each resource can have multiple labels, up to a maximum of 64.
- Each label must be a key-value pair.
- Keys have a minimum length of 1 character and a maximum length of 63 characters, and cannot be empty. Values can be empty, and have a maximum length of 63 characters.
- Keys and values can contain only lowercase letters, numeric characters, underscores, and dashes. All characters must use UTF-8 encoding, and international characters are allowed.
- The key portion of a label must be unique. However, you can use the same key with multiple resources.
- Keys must start with a lowercase letter or international character.
- (job) — a job configuration object
- job (Google::Cloud::Bigquery::ExtractJob::Updater) — a job configuration object for setting additional options.
Export table data
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new table_id = "bigquery-public-data.samples.shakespeare" extract_job = bigquery.extract_job table_id, "gs://my-bucket/shakespeare.csv" extract_job.wait_until_done! extract_job.done? #=> true
Export a model
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" model = dataset.model "my_model" extract_job = bigquery.extract model, "gs://my-bucket/#{model.model_id}"
#initialize
def initialize(service) -> Project
- (Project) — a new instance of Project
#job
def job(job_id, location: nil) -> Google::Cloud::Bigquery::Job, nil
Retrieves an existing job by ID.
- job_id (String) — The ID of a job.
- location (String) (defaults to: nil) — The geographic location where the job was created. Required except for US and EU.
-
(Google::Cloud::Bigquery::Job, nil) — Returns
nil
if the job does not exist.
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new job = bigquery.job "my_job"
#jobs
def jobs(all: nil, token: nil, max: nil, filter: nil, min_created_at: nil, max_created_at: nil, parent_job: nil) -> Array<Google::Cloud::Bigquery::Job>
Retrieves the list of jobs belonging to the project.
-
all (Boolean) (defaults to: nil) — Whether to display jobs owned by all users in the
project. The default is
false
. Optional. - token (String) (defaults to: nil) — A previously-returned page token representing part of the larger set of results to view. Optional.
- max (Integer) (defaults to: nil) — Maximum number of jobs to return. Optional.
-
filter (String) (defaults to: nil) —
A filter for job state. Optional.
Acceptable values are:
done
- Finished jobspending
- Pending jobsrunning
- Running jobs
- min_created_at (Time) (defaults to: nil) — Min value for Job#created_at. When provided, only jobs created after or at this time are returned. Optional.
- max_created_at (Time) (defaults to: nil) — Max value for Job#created_at. When provided, only jobs created before or at this time are returned. Optional.
- parent_job (Google::Cloud::Bigquery::Job, String) (defaults to: nil) — A job object or a job ID. If set, retrieve only child jobs of the specified parent. Optional. See Job#job_id, Job#num_child_jobs, and Job#parent_job_id.
- (Array<Google::Cloud::Bigquery::Job>) — (See Job::List)
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new jobs = bigquery.jobs jobs.each do |job| # process job end
Retrieve only running jobs using the filter
optional arg:
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new running_jobs = bigquery.jobs filter: "running" running_jobs.each do |job| # process job end
Retrieve only jobs created within provided times:
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new two_days_ago = Time.now - 60*60*24*2 three_days_ago = Time.now - 60*60*24*3 jobs = bigquery.jobs min_created_at: three_days_ago, max_created_at: two_days_ago jobs.each do |job| # process job end
Retrieve all jobs: (See Job::List#all)
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new jobs = bigquery.jobs jobs.all do |job| # process job end
Retrieve child jobs by setting parent_job
:
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new multi_statement_sql = <<~SQL -- Declare a variable to hold names as an array. DECLARE top_names ARRAY<STRING>; -- Build an array of the top 100 names from the year 2017. SET top_names = ( SELECT ARRAY_AGG(name ORDER BY number DESC LIMIT 100) FROM `bigquery-public-data.usa_names.usa_1910_current` WHERE year = 2017 ); -- Which names appear as words in Shakespeare's plays? SELECT name AS shakespeare_name FROM UNNEST(top_names) AS name WHERE name IN ( SELECT word FROM `bigquery-public-data.samples.shakespeare` ); SQL job = bigquery.query_job multi_statement_sql job.wait_until_done! child_jobs = bigquery.jobs parent_job: job child_jobs.each do |child_job| script_statistics = child_job.script_statistics puts script_statistics.evaluation_kind script_statistics.stack_frames.each do |stack_frame| puts stack_frame.text end end
#name
def name() -> String, nil
The descriptive name of the project. Can only be present if the project was retrieved with #projects.
- (String, nil) — the current value of name
#numeric_id
def numeric_id() -> Integer, nil
The numeric ID of the project. Can only be present if the project was retrieved with #projects.
- (Integer, nil) — the current value of numeric_id
#project
def project()
The BigQuery project connected to.
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new( project_id: "my-project", credentials: "/path/to/keyfile.json" ) bigquery.project_id #=> "my-project"
#project_id
def project_id()
The BigQuery project connected to.
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new( project_id: "my-project", credentials: "/path/to/keyfile.json" ) bigquery.project_id #=> "my-project"
#projects
def projects(token: nil, max: nil) -> Array<Google::Cloud::Bigquery::Project>
Retrieves the list of all projects for which the currently authorized account has been granted any project role. The returned project instances share the same credentials as the project used to retrieve them, but lazily create a new API connection for interactions with the BigQuery service.
- token (String) (defaults to: nil) — A previously-returned page token representing part of the larger set of results to view.
- max (Integer) (defaults to: nil) — Maximum number of projects to return.
- (Array<Google::Cloud::Bigquery::Project>) — (See List)
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new projects = bigquery.projects projects.each do |project| puts project.name project.datasets.all.each do |dataset| puts dataset.name end end
Retrieve all projects: (See List#all)
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new projects = bigquery.projects projects.all do |project| puts project.name project.datasets.all.each do |dataset| puts dataset.name end end
#query
def query(query, params: nil, types: nil, external: nil, max: nil, cache: true, dataset: nil, project: nil, standard_sql: nil, legacy_sql: nil, session_id: nil, &block) { |job| ... } -> Google::Cloud::Bigquery::Data
Queries data and waits for the results. In this method, a QueryJob is created and its results are saved to a temporary table, then read from the table. Timeouts and transient errors are generally handled as needed to complete the query. When used for executing DDL/DML statements, this method does not return row data.
The geographic location for the job ("US", "EU", etc.) can be set via QueryJob::Updater#location= in a block passed to this method.
- query (String) — A query string, following the BigQuery query syntax, of the query to execute. Example: "SELECT count(f1) FROM [myProjectId:myDatasetId.myTableId]".
-
params (Array, Hash) (defaults to: nil) — Standard SQL only. Used to pass query arguments when the
query
string contains either positional (?
) or named (@myparam
) query parameters. If value passed is an array["foo"]
, the query must use positional query parameters. If value passed is a hash{ myparam: "foo" }
, the query must use named query parameters. When set,legacy_sql
will automatically be set to false andstandard_sql
to true.BigQuery types are converted from Ruby types as follows:
| BigQuery | Ruby | Notes | |--------------|--------------------------------------|----------------------------------------------------| |
BOOL
|true
/false
| | |INT64
|Integer
| | |FLOAT64
|Float
| | |NUMERIC
|BigDecimal
|BigDecimal
values will be rounded to scale 9. | |BIGNUMERIC
|BigDecimal
| NOT AUTOMATIC: Must be mapped usingtypes
, below.| |STRING
|String
| | |DATETIME
|DateTime
|DATETIME
does not support time zone. | |DATE
|Date
| | |GEOGRAPHY
|String
(WKT or GeoJSON) | NOT AUTOMATIC: Must be mapped usingtypes
, below.| |TIMESTAMP
|Time
| | |TIME
|Google::Cloud::BigQuery::Time
| | |BYTES
|File
,IO
,StringIO
, or similar | | |ARRAY
|Array
| Nested arrays,nil
values are not supported. | |STRUCT
|Hash
| Hash keys may be strings or symbols. |See Data Types for an overview of each BigQuery data type, including allowed values. For the
GEOGRAPHY
type, see Working with BigQuery GIS data. -
types (Array, Hash) (defaults to: nil) — Standard SQL only. Types of the SQL parameters in
params
. It is not always possible to infer the right SQL type from a value inparams
. In these cases,types
must be used to specify the SQL type for these values.Arguments must match the value type passed to
params
. This must be anArray
when the query uses positional query parameters. This must be anHash
when the query uses named query parameters. The values should be BigQuery type codes from the following list::BOOL
:INT64
:FLOAT64
:NUMERIC
:BIGNUMERIC
:STRING
:DATETIME
:DATE
:GEOGRAPHY
:TIMESTAMP
:TIME
:BYTES
Array
- Lists are specified by providing the type code in an array. For example, an array of integers are specified as[:INT64]
.Hash
- Types for STRUCT values (Hash
objects) are specified using aHash
object, where the keys match theparams
hash, and the values are the types value that matches the data.
Types are optional.
- external (Hash<String|Symbol, External::DataSource>) (defaults to: nil) — A Hash that represents the mapping of the external tables to the table names used in the SQL query. The hash keys are the table names, and the hash values are the external table objects. See #query.
- max (Integer) (defaults to: nil) — The maximum number of rows of data to return per page of results. Setting this flag to a small value such as 1000 and then paging through results might improve reliability when the query result set is large. In addition to this limit, responses are also limited to 10 MB. By default, there is no maximum row count, and only the byte limit applies.
- cache (Boolean) (defaults to: true) — Whether to look for the result in the query cache. The query cache is a best-effort cache that will be flushed whenever tables in the query are modified. The default value is true. For more information, see query caching.
- dataset (String) (defaults to: nil) — Specifies the default datasetId and projectId to assume for any unqualified table names in the query. If not set, all table names in the query string must be qualified in the format 'datasetId.tableId'.
-
project (String) (defaults to: nil) — Specifies the default projectId to assume for
any unqualified table names in the query. Only used if
dataset
option is set. -
standard_sql (Boolean) (defaults to: nil) — Specifies whether to use BigQuery's
standard
SQL
dialect for this query. If set to true, the query will use standard
SQL rather than the legacy
SQL
dialect. When set to true, the values of
large_results
andflatten
are ignored; the query will be run as iflarge_results
is true andflatten
is false. Optional. The default value is true. -
legacy_sql (Boolean) (defaults to: nil) — Specifies whether to use BigQuery's
legacy
SQL
dialect for this query. If set to false, the query will use
BigQuery's standard
SQL
When set to false, the values of
large_results
andflatten
are ignored; the query will be run as iflarge_results
is true andflatten
is false. Optional. The default value is false. -
session_id (String) (defaults to: nil) — The ID of an existing session. See the
create_session
param in #query_job and Job#session_id.
- (job) — a job configuration object
- job (Google::Cloud::Bigquery::QueryJob::Updater) — a job configuration object for setting additional options for the query.
Query using standard SQL:
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new sql = "SELECT name FROM `my_project.my_dataset.my_table`" data = bigquery.query sql # Iterate over the first page of results data.each do |row| puts row[:name] end # Retrieve the next page of results data = data.next if data.next?
Query using legacy SQL:
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new sql = "SELECT name FROM [my_project:my_dataset.my_table]" data = bigquery.query sql, legacy_sql: true # Iterate over the first page of results data.each do |row| puts row[:name] end # Retrieve the next page of results data = data.next if data.next?
Retrieve all rows: (See Data#all)
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new data = bigquery.query "SELECT name FROM `my_dataset.my_table`" data.all do |row| puts row[:name] end
Query using positional query parameters:
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new data = bigquery.query "SELECT name FROM `my_dataset.my_table` WHERE id = ?", params: [1] # Iterate over the first page of results data.each do |row| puts row[:name] end # Retrieve the next page of results data = data.next if data.next?
Query using named query parameters:
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new data = bigquery.query "SELECT name FROM `my_dataset.my_table` WHERE id = @id", params: { id: 1 } # Iterate over the first page of results data.each do |row| puts row[:name] end # Retrieve the next page of results data = data.next if data.next?
Query using named query parameters with types:
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new data = bigquery.query "SELECT name FROM `my_dataset.my_table` WHERE id IN UNNEST(@ids)", params: { ids: [] }, types: { ids: [:INT64] } # Iterate over the first page of results data.each do |row| puts row[:name] end # Retrieve the next page of results data = data.next if data.next?
Execute a DDL statement:
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new data = bigquery.query "CREATE TABLE `my_dataset.my_table` (x INT64)" table_ref = data.ddl_target_table # Or ddl_target_routine for CREATE/DROP FUNCTION/PROCEDURE
Execute a DML statement:
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new data = bigquery.query "UPDATE `my_dataset.my_table` SET x = x + 1 WHERE x IS NOT NULL" puts data.num_dml_affected_rows
Query using external data source, set destination:
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new csv_url = "gs://bucket/path/to/data.csv" csv_table = bigquery.external csv_url do |csv| csv.autodetect = true csv.skip_leading_rows = 1 end data = bigquery.query "SELECT * FROM my_ext_table" do |query| query.external = { my_ext_table: csv_table } dataset = bigquery.dataset "my_dataset", skip_lookup: true query.table = dataset.table "my_table", skip_lookup: true end # Iterate over the first page of results data.each do |row| puts row[:name] end # Retrieve the next page of results data = data.next if data.next?
#query_job
def query_job(query, params: nil, types: nil, external: nil, priority: "INTERACTIVE", cache: true, table: nil, create: nil, write: nil, dryrun: nil, dataset: nil, project: nil, standard_sql: nil, legacy_sql: nil, large_results: nil, flatten: nil, maximum_billing_tier: nil, maximum_bytes_billed: nil, job_id: nil, prefix: nil, labels: nil, udfs: nil, create_session: nil, session_id: nil) { |job| ... } -> Google::Cloud::Bigquery::QueryJob
Queries data by creating a query job.
The geographic location for the job ("US", "EU", etc.) can be set via QueryJob::Updater#location= in a block passed to this method.
- query (String) — A query string, following the BigQuery query syntax, of the query to execute. Example: "SELECT count(f1) FROM [myProjectId:myDatasetId.myTableId]".
-
params (Array, Hash) (defaults to: nil) — Standard SQL only. Used to pass query arguments when the
query
string contains either positional (?
) or named (@myparam
) query parameters. If value passed is an array["foo"]
, the query must use positional query parameters. If value passed is a hash{ myparam: "foo" }
, the query must use named query parameters. When set,legacy_sql
will automatically be set to false andstandard_sql
to true.BigQuery types are converted from Ruby types as follows:
| BigQuery | Ruby | Notes | |--------------|--------------------------------------|----------------------------------------------------| |
BOOL
|true
/false
| | |INT64
|Integer
| | |FLOAT64
|Float
| | |NUMERIC
|BigDecimal
|BigDecimal
values will be rounded to scale 9. | |BIGNUMERIC
|BigDecimal
| NOT AUTOMATIC: Must be mapped usingtypes
, below.| |STRING
|String
| | |DATETIME
|DateTime
|DATETIME
does not support time zone. | |DATE
|Date
| | |GEOGRAPHY
|String
(WKT or GeoJSON) | NOT AUTOMATIC: Must be mapped usingtypes
, below.| |TIMESTAMP
|Time
| | |TIME
|Google::Cloud::BigQuery::Time
| | |BYTES
|File
,IO
,StringIO
, or similar | | |ARRAY
|Array
| Nested arrays,nil
values are not supported. | |STRUCT
|Hash
| Hash keys may be strings or symbols. |See Data Types for an overview of each BigQuery data type, including allowed values. For the
GEOGRAPHY
type, see Working with BigQuery GIS data. -
types (Array, Hash) (defaults to: nil) — Standard SQL only. Types of the SQL parameters in
params
. It is not always possible to infer the right SQL type from a value inparams
. In these cases,types
must be used to specify the SQL type for these values.Arguments must match the value type passed to
params
. This must be anArray
when the query uses positional query parameters. This must be anHash
when the query uses named query parameters. The values should be BigQuery type codes from the following list::BOOL
:INT64
:FLOAT64
:NUMERIC
:BIGNUMERIC
:STRING
:DATETIME
:DATE
:GEOGRAPHY
:TIMESTAMP
:TIME
:BYTES
Array
- Lists are specified by providing the type code in an array. For example, an array of integers are specified as[:INT64]
.Hash
- Types for STRUCT values (Hash
objects) are specified using aHash
object, where the keys match theparams
hash, and the values are the types value that matches the data.
Types are optional.
- external (Hash<String|Symbol, External::DataSource>) (defaults to: nil) — A Hash that represents the mapping of the external tables to the table names used in the SQL query. The hash keys are the table names, and the hash values are the external table objects. See #query.
-
priority (String) (defaults to: "INTERACTIVE") — Specifies a priority for the query. Possible
values include
INTERACTIVE
andBATCH
. The default value isINTERACTIVE
. - cache (Boolean) (defaults to: true) — Whether to look for the result in the query cache. The query cache is a best-effort cache that will be flushed whenever tables in the query are modified. The default value is true. For more information, see query caching.
- table (Table) (defaults to: nil) — The destination table where the query results should be stored. If not present, a new table will be created to store the results.
-
create (String) (defaults to: nil) —
Specifies whether the job is allowed to create new tables. The default value is
needed
.The following values are supported:
needed
- Create the table if it does not exist.never
- The table must already exist. A 'notFound' error is raised if the table does not exist.
-
write (String) (defaults to: nil) —
Specifies the action that occurs if the destination table already exists. The default value is
empty
.The following values are supported:
truncate
- BigQuery overwrites the table data.append
- BigQuery appends the data to the table.empty
- A 'duplicate' error is returned in the job result if the table exists and contains data.
- dryrun (Boolean) (defaults to: nil) — If set to true, BigQuery doesn't run the job. Instead, if the query is valid, BigQuery returns statistics about the job such as how many bytes would be processed. If the query is invalid, an error returns. The default value is false.
- dataset (Dataset, String) (defaults to: nil) — The default dataset to use for unqualified table names in the query. Optional.
-
project (String) (defaults to: nil) — Specifies the default projectId to assume for
any unqualified table names in the query. Only used if
dataset
option is set. - standard_sql (Boolean) (defaults to: nil) — Specifies whether to use BigQuery's standard SQL dialect for this query. If set to true, the query will use standard SQL rather than the legacy SQL dialect. Optional. The default value is true.
- legacy_sql (Boolean) (defaults to: nil) — Specifies whether to use BigQuery's legacy SQL dialect for this query. If set to false, the query will use BigQuery's standard SQL dialect. Optional. The default value is false.
-
large_results (Boolean) (defaults to: nil) — This option is specific to Legacy SQL.
If
true
, allows the query to produce arbitrarily large result tables at a slight cost in performance. Requirestable
parameter to be set. -
flatten (Boolean) (defaults to: nil) — This option is specific to Legacy SQL.
Flattens all nested and repeated fields in the query results. The
default value is
true
.large_results
parameter must betrue
if this is set tofalse
. - maximum_billing_tier (Integer) (defaults to: nil) — Limits the billing tier for this job. Queries that have resource usage beyond this tier will fail (without incurring a charge). WARNING: The billed byte amount can be multiplied by an amount up to this number! Most users should not need to alter this setting, and we recommend that you avoid introducing new uses of it. Deprecated.
- maximum_bytes_billed (Integer) (defaults to: nil) — Limits the bytes billed for this job. Queries that will have bytes billed beyond this limit will fail (without incurring a charge). Optional. If unspecified, this will be set to your project default.
-
job_id (String) (defaults to: nil) — A user-defined ID for the query job. The ID
must contain only letters (
[A-Za-z]
), numbers ([0-9]
), underscores (_
), or dashes (-
). The maximum length is 1,024 characters. Ifjob_id
is provided, thenprefix
will not be used.See Generating a job ID.
-
prefix (String) (defaults to: nil) — A string, usually human-readable, that will be
prepended to a generated value to produce a unique job ID. For
example, the prefix
daily_import_job_
can be given to generate a job ID such asdaily_import_job_12vEDtMQ0mbp1Mo5Z7mzAFQJZazh
. The prefix must contain only letters ([A-Za-z]
), numbers ([0-9]
), underscores (_
), or dashes (-
). The maximum length of the entire ID is 1,024 characters. Ifjob_id
is provided, thenprefix
will not be used.See Generating a job ID.
-
labels (Hash) (defaults to: nil) —
A hash of user-provided labels associated with the job. You can use these to organize and group your jobs.
The labels applied to a resource must meet the following requirements:
- Each resource can have multiple labels, up to a maximum of 64.
- Each label must be a key-value pair.
- Keys have a minimum length of 1 character and a maximum length of 63 characters, and cannot be empty. Values can be empty, and have a maximum length of 63 characters.
- Keys and values can contain only lowercase letters, numeric characters, underscores, and dashes. All characters must use UTF-8 encoding, and international characters are allowed.
- The key portion of a label must be unique. However, you can use the same key with multiple resources.
- Keys must start with a lowercase letter or international character.
-
udfs (Array<String>, String) (defaults to: nil) — User-defined function resources
used in a legacy SQL query. May be either a code resource to load from
a Google Cloud Storage URI (
gs://bucket/path
), or an inline resource that contains code for a user-defined function (UDF). Providing an inline code resource is equivalent to providing a URI for a file containing the same code.This parameter is used for defining User Defined Function (UDF) resources only when using legacy SQL. Users of standard SQL should leverage either DDL (e.g.
CREATE [TEMPORARY] FUNCTION ...
) or the Routines API to define UDF resources.For additional information on migrating, see: Migrating to standard SQL - Differences in user-defined JavaScript functions
-
create_session (Boolean) (defaults to: nil) — If true, creates a new session, where the
session ID will be a server generated random id. If false, runs query
with an existing session ID when one is provided in the
session_id
param, otherwise runs query in non-session mode. See Job#session_id. The default value is false. -
session_id (String) (defaults to: nil) — The ID of an existing session. See also the
create_session
param and Job#session_id.
- (job) — a job configuration object
- job (Google::Cloud::Bigquery::QueryJob::Updater) — a job configuration object for setting query options.
Query using standard SQL:
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new job = bigquery.query_job "SELECT name FROM `my_project.my_dataset.my_table`" job.wait_until_done! if !job.failed? job.data.each do |row| puts row[:name] end end
Query using legacy SQL:
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new job = bigquery.query_job "SELECT name FROM [my_project:my_dataset.my_table]", legacy_sql: true job.wait_until_done! if !job.failed? job.data.each do |row| puts row[:name] end end
Query using positional query parameters:
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new job = bigquery.query_job "SELECT name FROM `my_dataset.my_table` WHERE id = ?", params: [1] job.wait_until_done! if !job.failed? job.data.each do |row| puts row[:name] end end
Query using named query parameters:
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new job = bigquery.query_job "SELECT name FROM `my_dataset.my_table` WHERE id = @id", params: { id: 1 } job.wait_until_done! if !job.failed? job.data.each do |row| puts row[:name] end end
Query using named query parameters with types:
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new job = bigquery.query_job "SELECT name FROM `my_dataset.my_table` WHERE id IN UNNEST(@ids)", params: { ids: [] }, types: { ids: [:INT64] } job.wait_until_done! if !job.failed? job.data.each do |row| puts row[:name] end end
Execute a DDL statement:
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new job = bigquery.query_job "CREATE TABLE`my_dataset.my_table` (x INT64)" job.wait_until_done! if !job.failed? table_ref = job.ddl_target_table # Or ddl_target_routine for CREATE/DROP FUNCTION/PROCEDURE end
Execute a DML statement:
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new job = bigquery.query_job "UPDATE `my_dataset.my_table` SET x = x + 1 WHERE x IS NOT NULL" job.wait_until_done! if !job.failed? puts job.num_dml_affected_rows end
Query using external data source, set destination:
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new csv_url = "gs://bucket/path/to/data.csv" csv_table = bigquery.external csv_url do |csv| csv.autodetect = true csv.skip_leading_rows = 1 end job = bigquery.query_job "SELECT * FROM my_ext_table" do |query| query.external = { my_ext_table: csv_table } dataset = bigquery.dataset "my_dataset", skip_lookup: true query.table = dataset.table "my_table", skip_lookup: true end job.wait_until_done! if !job.failed? job.data.each do |row| puts row[:name] end end
#schema
def schema() { |schema| ... } -> Google::Cloud::Bigquery::Schema
Creates a new schema instance. An optional block may be given to configure the schema, otherwise the schema is returned empty and may be configured directly.
The returned schema can be passed to Dataset#load using the
schema
option. However, for most use cases, the block yielded by
Dataset#load is a more convenient way to configure the schema
for the destination table.
- (schema) — a block for setting the schema
- schema (Schema) — the object accepting the schema
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new schema = bigquery.schema do |s| s.string "first_name", mode: :required s.record "cities_lived", mode: :repeated do |nested_schema| nested_schema.string "place", mode: :required nested_schema.integer "number_of_years", mode: :required end end dataset = bigquery.dataset "my_dataset" gs_url = "gs://my-bucket/file-name.csv" load_job = dataset.load_job "my_new_table", gs_url, schema: schema
#service_account_email
def service_account_email() -> String
The email address of the service account for the project used to connect to BigQuery. (See also #project_id.)
- (String) — The service account email address.
#time
def time(hour, minute, second) -> Bigquery::Time
Creates a Bigquery::Time object to represent a time, independent of a specific date.
- hour (Integer) — Hour, valid values from 0 to 23.
- minute (Integer) — Minute, valid values from 0 to 59.
- second (Integer, Float) — Second, valid values from 0 to 59. Can contain microsecond precision.
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new fourpm = bigquery.time 16, 0, 0 data = bigquery.query "SELECT name FROM `my_dataset.my_table` WHERE time_of_date = @time", params: { time: fourpm } # Iterate over the first page of results data.each do |row| puts row[:name] end # Retrieve the next page of results data = data.next if data.next?
Create Time with fractional seconds:
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new precise_time = bigquery.time 16, 35, 15.376541 data = bigquery.query "SELECT name FROM `my_dataset.my_table` WHERE time_of_date >= @time", params: { time: precise_time } # Iterate over the first page of results data.each do |row| puts row[:name] end # Retrieve the next page of results data = data.next if data.next?