BigQuery API - Class Google::Cloud::Bigquery::Model (v1.41.0)

Reference documentation and code samples for the BigQuery API class Google::Cloud::Bigquery::Model.

Model

A model in BigQuery ML represents what an ML system has learned from the training data.

The following types of models are supported by BigQuery ML:

Linear regression for forecasting; for example, the sales of an item on a given day. Labels are real-valued (they cannot be +/- infinity or NaN).
Binary logistic regression for classification; for example, determining whether a customer will make a purchase. Labels must only have two possible values.
Multiclass logistic regression for classification. These models can be used to predict multiple possible values such as whether an input is "low-value," "medium-value," or "high-value." Labels can have up to 50 unique values. In BigQuery ML, multiclass logistic regression training uses a multinomial classifier with a cross entropy loss function.
K-means clustering for data segmentation (beta); for example, identifying customer segments. K-means is an unsupervised learning technique, so model training does not require labels nor split data for training or evaluation.

In BigQuery ML, a model can be used with data from multiple BigQuery datasets for training and for prediction.

Inherits

Object

Example

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"

model = dataset.model "my_model"

Methods

#created_at

def created_at() -> Time, nil

The time when this model was created.

Returns

(Time, nil) — The creation time, or nil if the object is a reference (see #reference?).

#dataset_id

def dataset_id() -> String

The ID of the Dataset containing this model.

Returns

(String) — The ID must contain only letters ([A-Za-z]), numbers ([0-9]), or underscores (_). The maximum length is 1,024 characters.

#delete

def delete() -> Boolean

Permanently deletes the model.

Returns

(Boolean) — Returns true if the model was deleted.

Example

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
model = dataset.model "my_model"

model.delete

#description

def description() -> String, nil

A user-friendly description of the model.

Returns

(String, nil) — The description, or nil if the object is a reference (see #reference?).

#description=

def description=(new_description)

Updates the user-friendly description of the model.

If the model is not a full resource representation (see #resource_full?), the full representation will be retrieved before the update to comply with ETag-based optimistic concurrency control.

Parameter

new_description (String) — The new user-friendly description.

#encryption

def encryption() -> EncryptionConfiguration, nil

The EncryptionConfiguration object that represents the custom encryption method used to protect this model. If not set, Dataset#default_encryption is used.

Present only if this model is using custom encryption.

Returns

(EncryptionConfiguration, nil) — The encryption configuration.

@!group Attributes

Example

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
model = dataset.model "my_model"

encrypt_config = model.encryption

#encryption=

def encryption=(value)

Set the EncryptionConfiguration object that represents the custom encryption method used to protect this model. If not set, Dataset#default_encryption is used.

Present only if this model is using custom encryption.

If the model is not a full resource representation (see #resource_full?), the full representation will be retrieved before the update to comply with ETag-based optimistic concurrency control.

Parameter

value (EncryptionConfiguration) — The new encryption config.

Example

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
model = dataset.model "my_model"

key_name = "projects/a/locations/b/keyRings/c/cryptoKeys/d"
encrypt_config = bigquery.encryption kms_key: key_name

model.encryption = encrypt_config

#etag

def etag() -> String, nil

The ETag hash of the model.

Returns

(String, nil) — The ETag hash, or nil if the object is a reference (see #reference?).

#exists?

def exists?(force: false) -> Boolean

Determines whether the model exists in the BigQuery service. The result is cached locally. To refresh state, set force to true.

Parameter

force (Boolean) (defaults to: false) — Force the latest resource representation to be retrieved from the BigQuery service when true. Otherwise the return value of this method will be memoized to reduce the number of API calls made to the BigQuery service. The default is false.

Returns

(Boolean) — true when the model exists in the BigQuery service, false otherwise.

Example

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

dataset = bigquery.dataset "my_dataset"
model = dataset.model "my_model", skip_lookup: true
model.exists? #=> true

#expires_at

def expires_at() -> Time, nil

The time when this model expires. If not present, the model will persist indefinitely. Expired models will be deleted and their storage reclaimed.

Returns

(Time, nil) — The expiration time, or nil if not present or the object is a reference (see #reference?).

#expires_at=

def expires_at=(new_expires_at)

Updates time when this model expires.

If the model is not a full resource representation (see #resource_full?), the full representation will be retrieved before the update to comply with ETag-based optimistic concurrency control.

Parameter

new_expires_at (Integer) — The new time when this model expires.

#extract

def extract(extract_url, format: nil, &block) { |job| ... } -> Boolean

Exports the model to Google Cloud Storage using a synchronous method that blocks for a response. Timeouts and transient errors are generally handled as needed to complete the job. See also #extract_job.

The geographic location for the job ("US", "EU", etc.) can be set via ExtractJob::Updater#location= in a block passed to this method. If the model is a full resource representation (see #resource_full?), the location of the job will automatically be set to the location of the model.

Parameters

extract_url (String) — The Google Storage URI to which BigQuery should extract the model. This value should be end in an object name prefix, since multiple objects will be exported.
format (String) (defaults to: nil) —
The exported file format. The default value is ml_tf_saved_model.

The following values are supported:
- ml_tf_saved_model - TensorFlow SavedModel
- ml_xgboost_booster - XGBoost Booster

Yields

(job) — a job configuration object

Yield Parameter

job (Google::Cloud::Bigquery::ExtractJob::Updater) — a job configuration object for setting additional options.

Returns

(Boolean) — Returns true if the extract operation succeeded.

Example

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
model = dataset.model "my_model"

model.extract "gs://my-bucket/#{model.model_id}"

#extract_job

def extract_job(extract_url, format: nil, job_id: nil, prefix: nil, labels: nil) { |job| ... } -> Google::Cloud::Bigquery::ExtractJob

Exports the model to Google Cloud Storage asynchronously, immediately returning an ExtractJob that can be used to track the progress of the export job. The caller may poll the service by repeatedly calling Job#reload! and Job#done? to detect when the job is done, or simply block until the job is done by calling #Job#wait_until_done!. See also #extract.

Parameters

extract_url (String) — The Google Storage URI to which BigQuery should extract the model. This value should be end in an object name prefix, since multiple objects will be exported.
format (String) (defaults to: nil) —
The exported file format. The default value is ml_tf_saved_model.

The following values are supported:
- ml_tf_saved_model - TensorFlow SavedModel
- ml_xgboost_booster - XGBoost Booster
job_id (String) (defaults to: nil) — A user-defined ID for the extract job. The ID must contain only letters ([A-Za-z]), numbers ([0-9]), underscores (_), or dashes (-). The maximum length is 1,024 characters. If job_id is provided, then prefix will not be used.

See Generating a job ID.
prefix (String) (defaults to: nil) — A string, usually human-readable, that will be prepended to a generated value to produce a unique job ID. For example, the prefix daily_import_job_ can be given to generate a job ID such as daily_import_job_12vEDtMQ0mbp1Mo5Z7mzAFQJZazh. The prefix must contain only letters ([A-Za-z]), numbers ([0-9]), underscores (_), or dashes (-). The maximum length of the entire ID is 1,024 characters. If job_id is provided, then prefix will not be used.
labels (Hash) (defaults to: nil) —
A hash of user-provided labels associated with the job. You can use these to organize and group your jobs.

The labels applied to a resource must meet the following requirements:
- Each resource can have multiple labels, up to a maximum of 64.
- Each label must be a key-value pair.
- Keys have a minimum length of 1 character and a maximum length of 63 characters, and cannot be empty. Values can be empty, and have a maximum length of 63 characters.
- Keys and values can contain only lowercase letters, numeric characters, underscores, and dashes. All characters must use UTF-8 encoding, and international characters are allowed.
- The key portion of a label must be unique. However, you can use the same key with multiple resources.
- Keys must start with a lowercase letter or international character.

Yields

(job) — a job configuration object

Yield Parameter

job (Google::Cloud::Bigquery::ExtractJob::Updater) — a job configuration object for setting additional options.

Returns

(Google::Cloud::Bigquery::ExtractJob)

Example

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
model = dataset.model "my_model"

extract_job = model.extract_job "gs://my-bucket/#{model.model_id}"

extract_job.wait_until_done!
extract_job.done? #=> true

#feature_columns

def feature_columns() -> Array<StandardSql::Field>

The input feature columns that were used to train this model.

Returns

(Array<StandardSql::Field>)

#label_columns

def label_columns() -> Array<StandardSql::Field>

The label columns that were used to train this model. The output of the model will have a "predicted_" prefix to these columns.

Returns

(Array<StandardSql::Field>)

#labels

def labels() -> Hash<String, String>, nil

A hash of user-provided labels associated with this model. Labels are used to organize and group models. See Using Labels.

The returned hash is frozen and changes are not allowed. Use #labels= to replace the entire hash.

Returns

(Hash<String, String>, nil) — A hash containing key/value pairs.

Example

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
model = dataset.model "my_model"

labels = model.labels

#labels=

def labels=(new_labels)

Updates the hash of user-provided labels associated with this model. Labels are used to organize and group models. See Using Labels.

If the model is not a full resource representation (see #resource_full?), the full representation will be retrieved before the update to comply with ETag-based optimistic concurrency control.

Parameter

new_labels (Hash<String, String>) —
A hash containing key/value pairs. The labels applied to a resource must meet the following requirements:
- Each resource can have multiple labels, up to a maximum of 64.
- Each label must be a key-value pair.
- Keys have a minimum length of 1 character and a maximum length of 63 characters, and cannot be empty. Values can be empty, and have a maximum length of 63 characters.
- Keys and values can contain only lowercase letters, numeric characters, underscores, and dashes. All characters must use UTF-8 encoding, and international characters are allowed.
- The key portion of a label must be unique. However, you can use the same key with multiple resources.
- Keys must start with a lowercase letter or international character.

Example

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
model = dataset.model "my_model"

model.labels = { "env" => "production" }

#location

def location() -> String, nil

The geographic location where the model should reside. Possible values include EU and US. The default value is US.

Returns

(String, nil) — The location code.

#model_id

def model_id() -> String

A unique ID for this model.

Returns

(String) — The ID must contain only letters ([A-Za-z]), numbers ([0-9]), or underscores (_). The maximum length is 1,024 characters.

#model_type

def model_type() -> String, nil

Type of the model resource. Expected to be one of the following:

LINEAR_REGRESSION - Linear regression model.
LOGISTIC_REGRESSION - Logistic regression based classification model.
KMEANS - K-means clustering model (beta).
TENSORFLOW - An imported TensorFlow model (beta).

Returns

(String, nil) — The model type, or nil if the object is a reference (see #reference?).

#modified_at

def modified_at() -> Time, nil

The date when this model was last modified.

Returns

(Time, nil) — The last modified time, or nil if not present or the object is a reference (see #reference?).

#name

def name() -> String, nil

The name of the model.

Returns

(String, nil) — The friendly name, or nil if the object is a reference (see #reference?).

#name=

def name=(new_name)

Updates the name of the model.

If the model is not a full resource representation (see #resource_full?), the full representation will be retrieved before the update to comply with ETag-based optimistic concurrency control.

Parameter

new_name (String) — The new friendly name.

#project_id

def project_id() -> String

The ID of the Project containing this model.

Returns

(String) — The project ID.

#reference?

def reference?() -> Boolean

Whether the model was created without retrieving the resource representation from the BigQuery service.

Returns

(Boolean) — true when the model is just a local reference object, false otherwise.

Example

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

dataset = bigquery.dataset "my_dataset"
model = dataset.model "my_model", skip_lookup: true

model.reference? #=> true
model.reload!
model.reference? #=> false

#refresh!

def refresh!() -> Google::Cloud::Bigquery::Model

Alias Of: #reload!

Reloads the model with current data from the BigQuery service.

Returns

(Google::Cloud::Bigquery::Model) — Returns the reloaded model.

Example

Skip retrieving the model from the service, then load it:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

dataset = bigquery.dataset "my_dataset"
model = dataset.model "my_model", skip_lookup: true

model.reference? #=> true
model.reload!
model.resource? #=> true

#reload!

def reload!() -> Google::Cloud::Bigquery::Model

Aliases

#refresh!

Reloads the model with current data from the BigQuery service.

Returns

(Google::Cloud::Bigquery::Model) — Returns the reloaded model.

Example

Skip retrieving the model from the service, then load it:

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

dataset = bigquery.dataset "my_dataset"
model = dataset.model "my_model", skip_lookup: true

model.reference? #=> true
model.reload!
model.resource? #=> true

#resource?

def resource?() -> Boolean

Whether the model was created with a resource representation from the BigQuery service.

Returns

(Boolean) — true when the model was created with a resource representation, false otherwise.

Example

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

dataset = bigquery.dataset "my_dataset"
model = dataset.model "my_model", skip_lookup: true

model.resource? #=> false
model.reload!
model.resource? #=> true

#resource_full?

def resource_full?() -> Boolean

Whether the model was created with a full resource representation from the BigQuery service.

Returns

(Boolean) — true when the model was created with a full resource representation, false otherwise.

Example

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

dataset = bigquery.dataset "my_dataset"
model = dataset.model "my_model"

model.resource_full? #=> true

#resource_partial?

def resource_partial?() -> Boolean

Whether the model was created with a partial resource representation from the BigQuery service by retrieval through Dataset#models. See Models: list response for the contents of the partial representation. Accessing any attribute outside of the partial representation will result in loading the full representation.

Returns

(Boolean) — true when the model was created with a partial resource representation, false otherwise.

Example

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new

dataset = bigquery.dataset "my_dataset"
model = dataset.models.first

model.resource_partial? #=> true
model.description # Loads the full resource.
model.resource_partial? #=> false

#training_runs

def training_runs() -> Array<Google::Cloud::Bigquery::Model::TrainingRun>

Information for all training runs in increasing order of startTime.

Returns

(Array<Google::Cloud::Bigquery::Model::TrainingRun>)