Reference documentation and code samples for the google-cloud-bigquery class Google::Cloud::Bigquery::Model.
Model
A model in BigQuery ML represents what an ML system has learned from the training data.
The following types of models are supported by BigQuery ML:
- Linear regression for forecasting; for example, the sales of an item on a given day. Labels are real-valued (they cannot be +/- infinity or NaN).
- Binary logistic regression for classification; for example, determining whether a customer will make a purchase. Labels must only have two possible values.
- Multiclass logistic regression for classification. These models can be used to predict multiple possible values such as whether an input is "low-value," "medium-value," or "high-value." Labels can have up to 50 unique values. In BigQuery ML, multiclass logistic regression training uses a multinomial classifier with a cross entropy loss function.
- K-means clustering for data segmentation (beta); for example, identifying customer segments. K-means is an unsupervised learning technique, so model training does not require labels nor split data for training or evaluation.
In BigQuery ML, a model can be used with data from multiple BigQuery datasets for training and for prediction.
Inherits
- Object
Example
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" model = dataset.model "my_model"
Methods
#created_at
def created_at() -> Time, nil
The time when this model was created.
-
(Time, nil) — The creation time, or
nil
if the object is a reference (see #reference?).
#dataset_id
def dataset_id() -> String
The ID of the Dataset
containing this model.
-
(String) — The ID must contain only letters (
[A-Za-z]
), numbers ([0-9]
), or underscores (_
). The maximum length is 1,024 characters.
#delete
def delete() -> Boolean
Permanently deletes the model.
-
(Boolean) — Returns
true
if the model was deleted.
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" model = dataset.model "my_model" model.delete
#description
def description() -> String, nil
A user-friendly description of the model.
-
(String, nil) — The description, or
nil
if the object is a reference (see #reference?).
#description=
def description=(new_description)
Updates the user-friendly description of the model.
If the model is not a full resource representation (see #resource_full?), the full representation will be retrieved before the update to comply with ETag-based optimistic concurrency control.
- new_description (String) — The new user-friendly description.
#encryption
def encryption() -> EncryptionConfiguration, nil
The EncryptionConfiguration object that represents the custom encryption method used to protect this model. If not set, Dataset#default_encryption is used.
Present only if this model is using custom encryption.
-
(EncryptionConfiguration, nil) — The encryption configuration.
@!group Attributes
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" model = dataset.model "my_model" encrypt_config = model.encryption
#encryption=
def encryption=(value)
Set the EncryptionConfiguration object that represents the custom encryption method used to protect this model. If not set, Dataset#default_encryption is used.
Present only if this model is using custom encryption.
If the model is not a full resource representation (see #resource_full?), the full representation will be retrieved before the update to comply with ETag-based optimistic concurrency control.
- value (EncryptionConfiguration) — The new encryption config.
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" model = dataset.model "my_model" key_name = "projects/a/locations/b/keyRings/c/cryptoKeys/d" encrypt_config = bigquery.encryption kms_key: key_name model.encryption = encrypt_config
#etag
def etag() -> String, nil
The ETag hash of the model.
-
(String, nil) — The ETag hash, or
nil
if the object is a reference (see #reference?).
#exists?
def exists?(force: false) -> Boolean
Determines whether the model exists in the BigQuery service. The
result is cached locally. To refresh state, set force
to true
.
-
force (Boolean) (defaults to: false) — Force the latest resource representation to be
retrieved from the BigQuery service when
true
. Otherwise the return value of this method will be memoized to reduce the number of API calls made to the BigQuery service. The default isfalse
.
-
(Boolean) —
true
when the model exists in the BigQuery service,false
otherwise.
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" model = dataset.model "my_model", skip_lookup: true model.exists? #=> true
#expires_at
def expires_at() -> Time, nil
The time when this model expires. If not present, the model will persist indefinitely. Expired models will be deleted and their storage reclaimed.
-
(Time, nil) — The expiration time, or
nil
if not present or the object is a reference (see #reference?).
#expires_at=
def expires_at=(new_expires_at)
Updates time when this model expires.
If the model is not a full resource representation (see #resource_full?), the full representation will be retrieved before the update to comply with ETag-based optimistic concurrency control.
- new_expires_at (Integer) — The new time when this model expires.
#extract
def extract(extract_url, format: nil, &block) { |job| ... } -> Boolean
Exports the model to Google Cloud Storage using a synchronous method that blocks for a response. Timeouts and transient errors are generally handled as needed to complete the job. See also #extract_job.
The geographic location for the job ("US", "EU", etc.) can be set via ExtractJob::Updater#location= in a block passed to this method. If the model is a full resource representation (see #resource_full?), the location of the job will automatically be set to the location of the model.
- extract_url (String) — The Google Storage URI to which BigQuery should extract the model. This value should be end in an object name prefix, since multiple objects will be exported.
-
format (String) (defaults to: nil) —
The exported file format. The default value is
ml_tf_saved_model
.The following values are supported:
ml_tf_saved_model
- TensorFlow SavedModelml_xgboost_booster
- XGBoost Booster
- (job) — a job configuration object
- job (Google::Cloud::Bigquery::ExtractJob::Updater) — a job configuration object for setting additional options.
-
(Boolean) — Returns
true
if the extract operation succeeded.
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" model = dataset.model "my_model" model.extract "gs://my-bucket/#{model.model_id}"
#extract_job
def extract_job(extract_url, format: nil, job_id: nil, prefix: nil, labels: nil) { |job| ... } -> Google::Cloud::Bigquery::ExtractJob
Exports the model to Google Cloud Storage asynchronously, immediately returning an ExtractJob that can be used to track the progress of the export job. The caller may poll the service by repeatedly calling Job#reload! and Job#done? to detect when the job is done, or simply block until the job is done by calling #Job#wait_until_done!. See also #extract.
The geographic location for the job ("US", "EU", etc.) can be set via ExtractJob::Updater#location= in a block passed to this method. If the model is a full resource representation (see #resource_full?), the location of the job will automatically be set to the location of the model.
- extract_url (String) — The Google Storage URI to which BigQuery should extract the model. This value should be end in an object name prefix, since multiple objects will be exported.
-
format (String) (defaults to: nil) —
The exported file format. The default value is
ml_tf_saved_model
.The following values are supported:
ml_tf_saved_model
- TensorFlow SavedModelml_xgboost_booster
- XGBoost Booster
-
job_id (String) (defaults to: nil) — A user-defined ID for the extract job. The ID
must contain only letters (
[A-Za-z]
), numbers ([0-9]
), underscores (_
), or dashes (-
). The maximum length is 1,024 characters. Ifjob_id
is provided, thenprefix
will not be used.See Generating a job ID.
-
prefix (String) (defaults to: nil) — A string, usually human-readable, that will be
prepended to a generated value to produce a unique job ID. For
example, the prefix
daily_import_job_
can be given to generate a job ID such asdaily_import_job_12vEDtMQ0mbp1Mo5Z7mzAFQJZazh
. The prefix must contain only letters ([A-Za-z]
), numbers ([0-9]
), underscores (_
), or dashes (-
). The maximum length of the entire ID is 1,024 characters. Ifjob_id
is provided, thenprefix
will not be used. -
labels (Hash) (defaults to: nil) —
A hash of user-provided labels associated with the job. You can use these to organize and group your jobs.
The labels applied to a resource must meet the following requirements:
- Each resource can have multiple labels, up to a maximum of 64.
- Each label must be a key-value pair.
- Keys have a minimum length of 1 character and a maximum length of 63 characters, and cannot be empty. Values can be empty, and have a maximum length of 63 characters.
- Keys and values can contain only lowercase letters, numeric characters, underscores, and dashes. All characters must use UTF-8 encoding, and international characters are allowed.
- The key portion of a label must be unique. However, you can use the same key with multiple resources.
- Keys must start with a lowercase letter or international character.
- (job) — a job configuration object
- job (Google::Cloud::Bigquery::ExtractJob::Updater) — a job configuration object for setting additional options.
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" model = dataset.model "my_model" extract_job = model.extract_job "gs://my-bucket/#{model.model_id}" extract_job.wait_until_done! extract_job.done? #=> true
#feature_columns
def feature_columns() -> Array<StandardSql::Field>
The input feature columns that were used to train this model.
- (Array<StandardSql::Field>)
#label_columns
def label_columns() -> Array<StandardSql::Field>
The label columns that were used to train this model. The output of the model will have a "predicted_" prefix to these columns.
- (Array<StandardSql::Field>)
#labels
def labels() -> Hash<String, String>, nil
A hash of user-provided labels associated with this model. Labels are used to organize and group models. See Using Labels.
The returned hash is frozen and changes are not allowed. Use #labels= to replace the entire hash.
- (Hash<String, String>, nil) — A hash containing key/value pairs.
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" model = dataset.model "my_model" labels = model.labels
#labels=
def labels=(new_labels)
Updates the hash of user-provided labels associated with this model. Labels are used to organize and group models. See Using Labels.
If the model is not a full resource representation (see #resource_full?), the full representation will be retrieved before the update to comply with ETag-based optimistic concurrency control.
-
new_labels (Hash<String, String>) —
A hash containing key/value pairs. The labels applied to a resource must meet the following requirements:
- Each resource can have multiple labels, up to a maximum of 64.
- Each label must be a key-value pair.
- Keys have a minimum length of 1 character and a maximum length of 63 characters, and cannot be empty. Values can be empty, and have a maximum length of 63 characters.
- Keys and values can contain only lowercase letters, numeric characters, underscores, and dashes. All characters must use UTF-8 encoding, and international characters are allowed.
- The key portion of a label must be unique. However, you can use the same key with multiple resources.
- Keys must start with a lowercase letter or international character.
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" model = dataset.model "my_model" model.labels = { "env" => "production" }
#location
def location() -> String, nil
The geographic location where the model should reside. Possible
values include EU
and US
. The default value is US
.
- (String, nil) — The location code.
#model_id
def model_id() -> String
A unique ID for this model.
-
(String) — The ID must contain only letters (
[A-Za-z]
), numbers ([0-9]
), or underscores (_
). The maximum length is 1,024 characters.
#model_type
def model_type() -> String, nil
Type of the model resource. Expected to be one of the following:
- LINEAR_REGRESSION - Linear regression model.
- LOGISTIC_REGRESSION - Logistic regression based classification model.
- KMEANS - K-means clustering model (beta).
- TENSORFLOW - An imported TensorFlow model (beta).
-
(String, nil) — The model type, or
nil
if the object is a reference (see #reference?).
#modified_at
def modified_at() -> Time, nil
The date when this model was last modified.
-
(Time, nil) — The last modified time, or
nil
if not present or the object is a reference (see #reference?).
#name
def name() -> String, nil
The name of the model.
-
(String, nil) — The friendly name, or
nil
if the object is a reference (see #reference?).
#name=
def name=(new_name)
Updates the name of the model.
If the model is not a full resource representation (see #resource_full?), the full representation will be retrieved before the update to comply with ETag-based optimistic concurrency control.
- new_name (String) — The new friendly name.
#project_id
def project_id() -> String
The ID of the Project
containing this model.
- (String) — The project ID.
#reference?
def reference?() -> Boolean
Whether the model was created without retrieving the resource representation from the BigQuery service.
-
(Boolean) —
true
when the model is just a local reference object,false
otherwise.
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" model = dataset.model "my_model", skip_lookup: true model.reference? #=> true model.reload! model.reference? #=> false
#refresh!
def refresh!() -> Google::Cloud::Bigquery::Model
Reloads the model with current data from the BigQuery service.
- (Google::Cloud::Bigquery::Model) — Returns the reloaded model.
Skip retrieving the model from the service, then load it:
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" model = dataset.model "my_model", skip_lookup: true model.reference? #=> true model.reload! model.resource? #=> true
#reload!
def reload!() -> Google::Cloud::Bigquery::Model
Reloads the model with current data from the BigQuery service.
- (Google::Cloud::Bigquery::Model) — Returns the reloaded model.
Skip retrieving the model from the service, then load it:
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" model = dataset.model "my_model", skip_lookup: true model.reference? #=> true model.reload! model.resource? #=> true
#resource?
def resource?() -> Boolean
Whether the model was created with a resource representation from the BigQuery service.
-
(Boolean) —
true
when the model was created with a resource representation,false
otherwise.
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" model = dataset.model "my_model", skip_lookup: true model.resource? #=> false model.reload! model.resource? #=> true
#resource_full?
def resource_full?() -> Boolean
Whether the model was created with a full resource representation from the BigQuery service.
-
(Boolean) —
true
when the model was created with a full resource representation,false
otherwise.
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" model = dataset.model "my_model" model.resource_full? #=> true
#resource_partial?
def resource_partial?() -> Boolean
Whether the model was created with a partial resource representation from the BigQuery service by retrieval through Dataset#models. See Models: list response for the contents of the partial representation. Accessing any attribute outside of the partial representation will result in loading the full representation.
-
(Boolean) —
true
when the model was created with a partial resource representation,false
otherwise.
require "google/cloud/bigquery" bigquery = Google::Cloud::Bigquery.new dataset = bigquery.dataset "my_dataset" model = dataset.models.first model.resource_partial? #=> true model.description # Loads the full resource. model.resource_partial? #=> false
#training_runs
def training_runs() -> Array<Google::Cloud::Bigquery::Model::TrainingRun>
Information for all training runs in increasing order of startTime.
- (Array<Google::Cloud::Bigquery::Model::TrainingRun>)