- Resource: Model
- ModelReference
- ModelType
- TrainingRun
- TrainingOptions
- LossType
- DataSplitMethod
- LearnRateStrategy
- DistanceType
- OptimizationStrategy
- KmeansInitializationMethod
- IterationResult
- ClusterInfo
- ArimaResult
- ArimaModelInfo
- ArimaOrder
- ArimaCoefficients
- ArimaFittingMetrics
- SeasonalPeriodType
- EvaluationMetrics
- RegressionMetrics
- BinaryClassificationMetrics
- AggregateClassificationMetrics
- BinaryConfusionMatrix
- MultiClassClassificationMetrics
- ConfusionMatrix
- Row
- Entry
- ClusteringMetrics
- Cluster
- FeatureValue
- CategoricalValue
- CategoryCount
- DataSplitResult
- Methods
Resource: Model
JSON representation | |
---|---|
{ "etag": string, "modelReference": { object ( |
Fields | |
---|---|
etag |
Output only. A hash of this resource. |
modelReference |
Required. Unique identifier for this model. |
creationTime |
Output only. The time when this model was created, in millisecs since the epoch. |
lastModifiedTime |
Output only. The time when this model was last modified, in millisecs since the epoch. |
description |
Optional. A user-friendly description of this model. |
friendlyName |
Optional. A descriptive name for this model. |
labels |
The labels associated with this model. You can use these to organize and group your models. Label keys and values can be no longer than 63 characters, can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. Label values are optional. Label keys must start with a letter and each label in the list must have a different key. An object containing a list of |
expirationTime |
Optional. The time when this model expires, in milliseconds since the epoch. If not present, the model will persist indefinitely. Expired models will be deleted and their storage reclaimed. The defaultTableExpirationMs property of the encapsulating dataset can be used to set a default expirationTime on newly created models. |
location |
Output only. The geographic location where the model resides. This value is inherited from the dataset. |
encryptionConfiguration |
Custom encryption configuration (e.g., Cloud KMS keys). This shows the encryption configuration of the model data while stored in BigQuery storage. |
modelType |
Output only. Type of the model resource. |
trainingRuns[] |
Output only. Information for all training runs in increasing order of startTime. |
featureColumns[] |
Output only. Input feature columns that were used to train this model. |
labelColumns[] |
Output only. Label columns that were used to train this model. The output of the model will have a "predicted_" prefix to these columns. |
ModelReference
Id path of a model.
JSON representation | |
---|---|
{ "projectId": string, "datasetId": string, "modelId": string } |
Fields | |
---|---|
projectId |
Required. The ID of the project containing this model. |
datasetId |
Required. The ID of the dataset containing this model. |
modelId |
Required. The ID of the model. The ID must contain only letters (a-z, A-Z), numbers (0-9), or underscores (_). The maximum length is 1,024 characters. |
ModelType
Indicates the type of the Model.
Enums | |
---|---|
MODEL_TYPE_UNSPECIFIED |
|
LINEAR_REGRESSION |
Linear regression model. |
LOGISTIC_REGRESSION |
Logistic regression based classification model. |
KMEANS |
K-means clustering model. |
TENSORFLOW |
[Beta] An imported TensorFlow model. |
TrainingRun
Information about a single training query run for the model.
JSON representation | |
---|---|
{ "trainingOptions": { object ( |
Fields | |
---|---|
trainingOptions |
Options that were used for this training run, includes user specified and default options that were used. |
startTime |
The start time of this training run. |
results[] |
Output of each iteration run, results.size() <= maxIterations. |
evaluationMetrics |
The evaluation metrics over training/eval data that were computed at the end of training. |
dataSplitResult |
Data split result of the training run. Only set when the input data is actually split. |
TrainingOptions
JSON representation | |
---|---|
{ "maxIterations": string, "lossType": enum ( |
Fields | |
---|---|
maxIterations |
The maximum number of iterations in training. Used only for iterative training algorithms. |
lossType |
Type of loss function used during training run. |
learnRate |
Learning rate in training. Used only for iterative training algorithms. |
l1Regularization |
L1 regularization coefficient. |
l2Regularization |
L2 regularization coefficient. |
minRelativeProgress |
When earlyStop is true, stops training when accuracy improvement is less than 'minRelativeProgress'. Used only for iterative training algorithms. |
warmStart |
Whether to train a model from the last checkpoint. |
earlyStop |
Whether to stop early when the loss doesn't improve significantly any more (compared to minRelativeProgress). Used only for iterative training algorithms. |
inputLabelColumns[] |
Name of input label columns in training data. |
dataSplitMethod |
The data split type for training and evaluation, e.g. RANDOM. |
dataSplitEvalFraction |
The fraction of evaluation data over the whole input data. The rest of data will be used as training data. The format should be double. Accurate to two decimal places. Default value is 0.2. |
dataSplitColumn |
The column to split data with. This column won't be used as a feature. 1. When dataSplitMethod is CUSTOM, the corresponding column should be boolean. The rows with true value tag are eval data, and the false are training data. 2. When dataSplitMethod is SEQ, the first DATA_SPLIT_EVAL_FRACTION rows (from smallest to largest) in the corresponding column are used as training data, and the rest are eval data. It respects the order in Orderable data types: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#data-type-properties |
learnRateStrategy |
The strategy to determine learn rate for the current iteration. |
initialLearnRate |
Specifies the initial learning rate for the line search learn rate strategy. |
labelClassWeights |
Weights associated with each label class, for rebalancing the training data. Only applicable for classification models. An object containing a list of |
distanceType |
Distance type for clustering models. |
numClusters |
Number of clusters for clustering models. |
modelUri |
[Beta] Google Cloud Storage URI from which the model was imported. Only applicable for imported models. |
optimizationStrategy |
Optimization strategy for training linear regression models. |
kmeansInitializationMethod |
The method used to initialize the centroids for kmeans algorithm. |
kmeansInitializationColumn |
The column used to provide the initial centroids for kmeans algorithm when kmeansInitializationMethod is CUSTOM. |
LossType
Loss metric to evaluate model training performance.
Enums | |
---|---|
LOSS_TYPE_UNSPECIFIED |
|
MEAN_SQUARED_LOSS |
Mean squared loss, used for linear regression. |
MEAN_LOG_LOSS |
Mean log loss, used for logistic regression. |
DataSplitMethod
Indicates the method to split input data into multiple tables.
Enums | |
---|---|
DATA_SPLIT_METHOD_UNSPECIFIED |
|
RANDOM |
Splits data randomly. |
CUSTOM |
Splits data with the user provided tags. |
SEQUENTIAL |
Splits data sequentially. |
NO_SPLIT |
Data split will be skipped. |
AUTO_SPLIT |
Splits data automatically: Uses NO_SPLIT if the data size is small. Otherwise uses RANDOM. |
LearnRateStrategy
Indicates the learning rate optimization strategy to use.
Enums | |
---|---|
LEARN_RATE_STRATEGY_UNSPECIFIED |
|
LINE_SEARCH |
Use line search to determine learning rate. |
CONSTANT |
Use a constant learning rate. |
DistanceType
Distance metric used to compute the distance between two points.
Enums | |
---|---|
DISTANCE_TYPE_UNSPECIFIED |
|
EUCLIDEAN |
Eculidean distance. |
COSINE |
Cosine distance. |
OptimizationStrategy
Indicates the optimization strategy used for training.
Enums | |
---|---|
OPTIMIZATION_STRATEGY_UNSPECIFIED |
|
BATCH_GRADIENT_DESCENT |
Uses an iterative batch gradient descent algorithm. |
NORMAL_EQUATION |
Uses a normal equation to solve linear regression problem. |
KmeansInitializationMethod
Indicates the method used to initialize the centroids for KMeans clustering algorithm.
Enums | |
---|---|
KMEANS_INITIALIZATION_METHOD_UNSPECIFIED |
|
RANDOM |
Initializes the centroids randomly. |
CUSTOM |
Initializes the centroids using data specified in kmeansInitializationColumn. |
KMEANS_PLUS_PLUS |
Initializes with kmeans++. |
IterationResult
Information about a single iteration of the training run.
JSON representation | |
---|---|
{ "index": number, "durationMs": string, "trainingLoss": number, "evalLoss": number, "learnRate": number, "clusterInfos": [ { object ( |
Fields | |
---|---|
index |
Index of the iteration, 0 based. |
durationMs |
Time taken to run the iteration in milliseconds. |
trainingLoss |
Loss computed on the training data at the end of iteration. |
evalLoss |
Loss computed on the eval data at the end of iteration. |
learnRate |
Learn rate used for this iteration. |
clusterInfos[] |
Information about top clusters for clustering models. |
arimaResult |
|
ClusterInfo
Information about a single cluster for clustering model.
JSON representation | |
---|---|
{ "centroidId": string, "clusterRadius": number, "clusterSize": string } |
Fields | |
---|---|
centroidId |
Centroid id. |
clusterRadius |
Cluster radius, the average distance from centroid to each point assigned to the cluster. |
clusterSize |
Cluster size, the total number of points assigned to the cluster. |
ArimaResult
(Auto-)arima fitting result. Wrap everything in ArimaResult for easier refactoring if we want to use model-specific iteration results.
JSON representation | |
---|---|
{ "arimaModelInfo": [ { object ( |
Fields | |
---|---|
arimaModelInfo[] |
This message is repeated because there are multiple arima models fitted in auto-arima. For non-auto-arima model, its size is one. |
seasonalPeriods[] |
Seasonal periods. Repeated because multiple periods are supported for one time series. |
ArimaModelInfo
Arima model information.
JSON representation | |
---|---|
{ "nonSeasonalOrder": { object ( |
Fields | |
---|---|
nonSeasonalOrder |
Non-seasonal order. |
arimaCoefficients |
Arima coefficients. |
arimaFittingMetrics |
Arima fitting metrics. |
ArimaOrder
Arima order, can be used for both non-seasonal and seasonal parts.
JSON representation | |
---|---|
{ "p": string, "d": string, "q": string } |
Fields | |
---|---|
p |
Order of the autoregressive part. |
d |
Order of the differencing part. |
q |
Order of the moving-average part. |
ArimaCoefficients
Arima coefficients.
JSON representation | |
---|---|
{ "autoRegressiveCoefficients": [ number ], "movingAverageCoefficients": [ number ], "interceptCoefficient": number } |
Fields | |
---|---|
autoRegressiveCoefficients[] |
Auto-regressive coefficients, an array of double. |
movingAverageCoefficients[] |
Moving-average coefficients, an array of double. |
interceptCoefficient |
Intercept coefficient, just a double not an array. |
ArimaFittingMetrics
ARIMA model fitting metrics.
JSON representation | |
---|---|
{ "logLikelihood": number, "aic": number, "variance": number } |
Fields | |
---|---|
logLikelihood |
log-likelihood |
aic |
AIC |
variance |
variance. |
SeasonalPeriodType
Enums | |
---|---|
SEASONAL_PERIOD_TYPE_UNSPECIFIED |
|
NO_SEASONALITY |
No seasonality |
DAILY |
Daily period, 24 hours. |
WEEKLY |
Weekly period, 7 days. |
MONTHLY |
Monthly period, can be as 30 days or irregular. |
QUARTERLY |
Quarterly period, can be as 90 days or irregular. |
YEARLY |
Yearly period, can be as 365 days or irregular. |
EvaluationMetrics
Evaluation metrics of a model. These are either computed on all training data or just the eval data based on whether eval data was used during training. These are not present for imported models.
JSON representation | |
---|---|
{ // Union field |
Fields | ||
---|---|---|
Union field
|
||
regressionMetrics |
Populated for regression models and explicit feedback type matrix factorization models. |
|
binaryClassificationMetrics |
Populated for binary classification/classifier models. |
|
multiClassClassificationMetrics |
Populated for multi-class classification/classifier models. |
|
clusteringMetrics |
Populated for clustering models. |
RegressionMetrics
Evaluation metrics for regression and explicit feedback type matrix factorization models.
JSON representation | |
---|---|
{ "meanAbsoluteError": number, "meanSquaredError": number, "meanSquaredLogError": number, "medianAbsoluteError": number, "rSquared": number } |
Fields | |
---|---|
meanAbsoluteError |
Mean absolute error. |
meanSquaredError |
Mean squared error. |
meanSquaredLogError |
Mean squared log error. |
medianAbsoluteError |
Median absolute error. |
rSquared |
R^2 score. |
BinaryClassificationMetrics
Evaluation metrics for binary classification/classifier models.
JSON representation | |
---|---|
{ "aggregateClassificationMetrics": { object ( |
Fields | |
---|---|
aggregateClassificationMetrics |
Aggregate classification metrics. |
binaryConfusionMatrixList[] |
Binary confusion matrix at multiple thresholds. |
positiveLabel |
Label representing the positive class. |
negativeLabel |
Label representing the negative class. |
AggregateClassificationMetrics
Aggregate metrics for classification/classifier models. For multi-class models, the metrics are either macro-averaged or micro-averaged. When macro-averaged, the metrics are calculated for each label and then an unweighted average is taken of those values. When micro-averaged, the metric is calculated globally by counting the total number of correctly predicted rows.
JSON representation | |
---|---|
{ "precision": number, "recall": number, "accuracy": number, "threshold": number, "f1Score": number, "logLoss": number, "rocAuc": number } |
Fields | |
---|---|
precision |
Precision is the fraction of actual positive predictions that had positive actual labels. For multiclass this is a macro-averaged metric treating each class as a binary classifier. |
recall |
Recall is the fraction of actual positive labels that were given a positive prediction. For multiclass this is a macro-averaged metric. |
accuracy |
Accuracy is the fraction of predictions given the correct label. For multiclass this is a micro-averaged metric. |
threshold |
Threshold at which the metrics are computed. For binary classification models this is the positive class threshold. For multi-class classfication models this is the confidence threshold. |
f1Score |
The F1 score is an average of recall and precision. For multiclass this is a macro-averaged metric. |
logLoss |
Logarithmic Loss. For multiclass this is a macro-averaged metric. |
rocAuc |
Area Under a ROC Curve. For multiclass this is a macro-averaged metric. |
BinaryConfusionMatrix
Confusion matrix for binary classification models.
JSON representation | |
---|---|
{ "positiveClassThreshold": number, "truePositives": string, "falsePositives": string, "trueNegatives": string, "falseNegatives": string, "precision": number, "recall": number, "f1Score": number, "accuracy": number } |
Fields | |
---|---|
positiveClassThreshold |
Threshold value used when computing each of the following metric. |
truePositives |
Number of true samples predicted as true. |
falsePositives |
Number of false samples predicted as true. |
trueNegatives |
Number of true samples predicted as false. |
falseNegatives |
Number of false samples predicted as false. |
precision |
The fraction of actual positive predictions that had positive actual labels. |
recall |
The fraction of actual positive labels that were given a positive prediction. |
f1Score |
The equally weighted average of recall and precision. |
accuracy |
The fraction of predictions given the correct label. |
MultiClassClassificationMetrics
Evaluation metrics for multi-class classification/classifier models.
JSON representation | |
---|---|
{ "aggregateClassificationMetrics": { object ( |
Fields | |
---|---|
aggregateClassificationMetrics |
Aggregate classification metrics. |
confusionMatrixList[] |
Confusion matrix at different thresholds. |
ConfusionMatrix
Confusion matrix for multi-class classification models.
JSON representation | |
---|---|
{
"confidenceThreshold": number,
"rows": [
{
object ( |
Fields | |
---|---|
confidenceThreshold |
Confidence threshold used when computing the entries of the confusion matrix. |
rows[] |
One row per actual label. |
Row
A single row in the confusion matrix.
JSON representation | |
---|---|
{
"actualLabel": string,
"entries": [
{
object ( |
Fields | |
---|---|
actualLabel |
The original label of this row. |
entries[] |
Info describing predicted label distribution. |
Entry
A single entry in the confusion matrix.
JSON representation | |
---|---|
{ "predictedLabel": string, "itemCount": string } |
Fields | |
---|---|
predictedLabel |
The predicted label. For confidenceThreshold > 0, we will also add an entry indicating the number of items under the confidence threshold. |
itemCount |
Number of items being predicted as this label. |
ClusteringMetrics
Evaluation metrics for clustering models.
JSON representation | |
---|---|
{
"daviesBouldinIndex": number,
"meanSquaredDistance": number,
"clusters": [
{
object ( |
Fields | |
---|---|
daviesBouldinIndex |
Davies-Bouldin index. |
meanSquaredDistance |
Mean of squared distances between each sample to its cluster centroid. |
clusters[] |
[Beta] Information for all clusters. |
Cluster
Message containing the information about one cluster.
JSON representation | |
---|---|
{
"centroidId": string,
"featureValues": [
{
object ( |
Fields | |
---|---|
centroidId |
Centroid id. |
featureValues[] |
Values of highly variant features for this cluster. |
count |
Count of training data rows that were assigned to this cluster. |
FeatureValue
Representative value of a single feature within the cluster.
JSON representation | |
---|---|
{ "featureColumn": string, // Union field |
Fields | ||
---|---|---|
featureColumn |
The feature column name. |
|
Union field
|
||
numericalValue |
The numerical feature value. This is the centroid value for this feature. |
|
categoricalValue |
The categorical feature value. |
CategoricalValue
Representative value of a categorical feature.
JSON representation | |
---|---|
{
"categoryCounts": [
{
object ( |
Fields | |
---|---|
categoryCounts[] |
Counts of all categories for the categorical feature. If there are more than ten categories, we return top ten (by count) and return one more CategoryCount with category "_OTHER_" and count as aggregate counts of remaining categories. |
CategoryCount
Represents the count of a single category within the cluster.
JSON representation | |
---|---|
{ "category": string, "count": string } |
Fields | |
---|---|
category |
The name of category. |
count |
The count of training samples matching the category within the cluster. |
DataSplitResult
Data split result. This contains references to the training and evaluation data tables that were used to train the model.
JSON representation | |
---|---|
{ "trainingTable": { object ( |
Fields | |
---|---|
trainingTable |
Table reference of the training data after split. |
evaluationTable |
Table reference of the evaluation data after split. |
Methods |
|
---|---|
|
Deletes the model specified by modelId from the dataset. |
|
Gets the specified model resource by model ID. |
|
Lists all models in the specified dataset. |
|
Patch specific fields in the specified model. |