- Resource: Model
- ModelReference
- ModelType
- TrainingRun
- TrainingOptions
- LossType
- DataSplitMethod
- LearnRateStrategy
- DistanceType
- OptimizationStrategy
- BoosterType
- DartNormalizeType
- TreeMethod
- FeedbackType
- KmeansInitializationMethod
- ArimaOrder
- DataFrequency
- HolidayRegion
- HparamTuningObjective
- EncodingMethod
- PcaSolver
- ModelRegistry
- IterationResult
- ClusterInfo
- ArimaResult
- ArimaModelInfo
- ArimaCoefficients
- ArimaFittingMetrics
- SeasonalPeriodType
- PrincipalComponentInfo
- EvaluationMetrics
- RegressionMetrics
- BinaryClassificationMetrics
- AggregateClassificationMetrics
- BinaryConfusionMatrix
- MultiClassClassificationMetrics
- ConfusionMatrix
- Row
- Entry
- ClusteringMetrics
- Cluster
- FeatureValue
- CategoricalValue
- CategoryCount
- RankingMetrics
- ArimaForecastingMetrics
- ArimaSingleModelForecastingMetrics
- DimensionalityReductionMetrics
- DataSplitResult
- GlobalExplanation
- Explanation
- TransformColumn
- HparamSearchSpaces
- DoubleHparamSearchSpace
- DoubleRange
- DoubleCandidates
- IntHparamSearchSpace
- IntRange
- IntCandidates
- IntArrayHparamSearchSpace
- IntArray
- StringHparamSearchSpace
- HparamTuningTrial
- TrialStatus
- RemoteModelInfo
- RemoteServiceType
- Methods
Resource: Model
JSON representation |
---|
{ "etag": string, "modelReference": { object ( |
Fields | |
---|---|
etag |
Output only. A hash of this resource. |
model |
Required. Unique identifier for this model. |
creation |
Output only. The time when this model was created, in millisecs since the epoch. |
last |
Output only. The time when this model was last modified, in millisecs since the epoch. |
description |
Optional. A user-friendly description of this model. |
friendly |
Optional. A descriptive name for this model. |
labels |
The labels associated with this model. You can use these to organize and group your models. Label keys and values can be no longer than 63 characters, can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. Label values are optional. Label keys must start with a letter and each label in the list must have a different key. |
expiration |
Optional. The time when this model expires, in milliseconds since the epoch. If not present, the model will persist indefinitely. Expired models will be deleted and their storage reclaimed. The defaultTableExpirationMs property of the encapsulating dataset can be used to set a default expirationTime on newly created models. |
location |
Output only. The geographic location where the model resides. This value is inherited from the dataset. |
encryption |
Custom encryption configuration (e.g., Cloud KMS keys). This shows the encryption configuration of the model data while stored in BigQuery storage. This field can be used with models.patch to update encryption key for an already encrypted model. |
model |
Output only. Type of the model resource. |
training |
Information for all training runs in increasing order of startTime. |
feature |
Output only. Input feature columns for the model inference. If the model is trained with TRANSFORM clause, these are the input of the TRANSFORM clause. |
label |
Output only. Label columns that were used to train this model. The output of the model will have a "predicted_" prefix to these columns. |
transform |
Output only. This field will be populated if a TRANSFORM clause was used to train a model. TRANSFORM clause (if used) takes featureColumns as input and outputs transformColumns. transformColumns then are used to train the model. |
hparam |
Output only. All hyperparameter search spaces in this model. |
bestTrialId |
The best trialId across all training runs. |
default |
Output only. The default trialId to use in TVFs when the trialId is not passed in. For single-objective hyperparameter tuning models, this is the best trial ID. For multi-objective hyperparameter tuning models, this is the smallest trial ID among all Pareto optimal trials. |
hparam |
Output only. Trials of a hyperparameter tuning model sorted by trialId. |
optimal |
Output only. For single-objective hyperparameter tuning models, it only contains the best trial. For multi-objective hyperparameter tuning models, it contains all Pareto optimal trials sorted by trialId. |
remote |
Output only. Remote model info |
ModelReference
Id path of a model.
JSON representation |
---|
{ "projectId": string, "datasetId": string, "modelId": string } |
Fields | |
---|---|
project |
Required. The ID of the project containing this model. |
dataset |
Required. The ID of the dataset containing this model. |
model |
Required. The ID of the model. The ID must contain only letters (a-z, A-Z), numbers (0-9), or underscores (_). The maximum length is 1,024 characters. |
ModelType
Indicates the type of the Model.
Enums | |
---|---|
MODEL_TYPE_UNSPECIFIED |
Default value. |
LINEAR_REGRESSION |
Linear regression model. |
LOGISTIC_REGRESSION |
Logistic regression based classification model. |
KMEANS |
K-means clustering model. |
MATRIX_FACTORIZATION |
Matrix factorization model. |
DNN_CLASSIFIER |
DNN classifier model. |
TENSORFLOW |
An imported TensorFlow model. |
DNN_REGRESSOR |
DNN regressor model. |
XGBOOST |
An imported XGBoost model. |
BOOSTED_TREE_REGRESSOR |
Boosted tree regressor model. |
BOOSTED_TREE_CLASSIFIER |
Boosted tree classifier model. |
ARIMA |
ARIMA model. |
AUTOML_REGRESSOR |
AutoML Tables regression model. |
AUTOML_CLASSIFIER |
AutoML Tables classification model. |
PCA |
Prinpical Component Analysis model. |
DNN_LINEAR_COMBINED_CLASSIFIER |
Wide-and-deep classifier model. |
DNN_LINEAR_COMBINED_REGRESSOR |
Wide-and-deep regressor model. |
AUTOENCODER |
Autoencoder model. |
ARIMA_PLUS |
New name for the ARIMA model. |
ARIMA_PLUS_XREG |
ARIMA with external regressors. |
RANDOM_FOREST_REGRESSOR |
Random forest regressor model. |
RANDOM_FOREST_CLASSIFIER |
Random forest classifier model. |
TENSORFLOW_LITE |
An imported TensorFlow Lite model. |
ONNX |
An imported ONNX model. |
TRANSFORM_ONLY |
Model to capture the columns and logic in the TRANSFORM clause along with statistics useful for ML analytic functions. |
CONTRIBUTION_ANALYSIS |
The contribution analysis model. |
TrainingRun
Information about a single training query run for the model.
JSON representation |
---|
{ "trainingOptions": { object ( |
Fields | |
---|---|
training |
Output only. Options that were used for this training run, includes user specified and default options that were used. |
trainingStartTime |
Output only. The start time of this training run, in milliseconds since epoch. |
start |
Output only. The start time of this training run. |
results[] |
Output only. Output of each iteration run, results.size() <= maxIterations. |
evaluation |
Output only. The evaluation metrics over training/eval data that were computed at the end of training. |
data |
Output only. Data split result of the training run. Only set when the input data is actually split. |
model |
Output only. Global explanation contains the explanation of top features on the model level. Applies to both regression and classification models. |
class |
Output only. Global explanation contains the explanation of top features on the class level. Applies to classification models only. |
vertex |
The model id in the Vertex AI Model Registry for this training run. |
vertex |
Output only. The model version in the Vertex AI Model Registry for this training run. |
TrainingOptions
Options used in model training.
JSON representation |
---|
{ "maxIterations": string, "lossType": enum ( |
Fields | |
---|---|
max |
The maximum number of iterations in training. Used only for iterative training algorithms. |
loss |
Type of loss function used during training run. |
learn |
Learning rate in training. Used only for iterative training algorithms. |
l1 |
L1 regularization coefficient. |
l2 |
L2 regularization coefficient. |
min |
When earlyStop is true, stops training when accuracy improvement is less than 'minRelativeProgress'. Used only for iterative training algorithms. |
warm |
Whether to train a model from the last checkpoint. |
early |
Whether to stop early when the loss doesn't improve significantly any more (compared to minRelativeProgress). Used only for iterative training algorithms. |
input |
Name of input label columns in training data. |
data |
The data split type for training and evaluation, e.g. RANDOM. |
data |
The fraction of evaluation data over the whole input data. The rest of data will be used as training data. The format should be double. Accurate to two decimal places. Default value is 0.2. |
data |
The column to split data with. This column won't be used as a feature. 1. When dataSplitMethod is CUSTOM, the corresponding column should be boolean. The rows with true value tag are eval data, and the false are training data. 2. When dataSplitMethod is SEQ, the first DATA_SPLIT_EVAL_FRACTION rows (from smallest to largest) in the corresponding column are used as training data, and the rest are eval data. It respects the order in Orderable data types: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#data-type-properties |
learn |
The strategy to determine learn rate for the current iteration. |
initial |
Specifies the initial learning rate for the line search learn rate strategy. |
label |
Weights associated with each label class, for rebalancing the training data. Only applicable for classification models. |
user |
User column specified for matrix factorization models. |
item |
Item column specified for matrix factorization models. |
distance |
Distance type for clustering models. |
num |
Number of clusters for clustering models. |
model |
Google Cloud Storage URI from which the model was imported. Only applicable for imported models. |
optimization |
Optimization strategy for training linear regression models. |
batch |
Batch size for dnn models. |
dropout |
Dropout probability for dnn models. |
max |
Maximum depth of a tree for boosted tree models. |
subsample |
Subsample fraction of the training data to grow tree to prevent overfitting for boosted tree models. |
min |
Minimum split loss for boosted tree models. |
booster |
Booster type for boosted tree models. |
num |
Number of parallel trees constructed during each iteration for boosted tree models. |
dart |
Type of normalization algorithm for boosted tree models using dart booster. |
tree |
Tree construction algorithm for boosted tree models. |
min |
Minimum sum of instance weight needed in a child for boosted tree models. |
colsample |
Subsample ratio of columns when constructing each tree for boosted tree models. |
colsample |
Subsample ratio of columns for each level for boosted tree models. |
colsample |
Subsample ratio of columns for each node(split) for boosted tree models. |
num |
Num factors specified for matrix factorization models. |
feedback |
Feedback type that specifies which algorithm to run for matrix factorization. |
wals |
Hyperparameter for matrix factoration when implicit feedback type is specified. |
kmeans |
The method used to initialize the centroids for kmeans algorithm. |
kmeans |
The column used to provide the initial centroids for kmeans algorithm when kmeansInitializationMethod is CUSTOM. |
time |
Column to be designated as time series timestamp for ARIMA model. |
time |
Column to be designated as time series data for ARIMA model. |
auto |
Whether to enable auto ARIMA or not. |
non |
A specification of the non-seasonal part of the ARIMA model: the three components (p, d, q) are the AR order, the degree of differencing, and the MA order. |
data |
The data frequency of a time series. |
calculate |
Whether or not p-value test should be computed for this model. Only available for linear and logistic regression models. |
include |
Include drift when fitting an ARIMA model. |
holiday |
The geographical region based on which the holidays are considered in time series modeling. If a valid value is specified, then holiday effects modeling is enabled. |
holiday |
A list of geographical regions that are used for time series modeling. |
time |
The time series id column that was used during ARIMA model training. |
time |
The time series id columns that were used during ARIMA model training. |
horizon |
The number of periods ahead that need to be forecasted. |
auto |
The max value of the sum of non-seasonal p and q. |
auto |
The min value of the sum of non-seasonal p and q. |
num |
Number of trials to run this hyperparameter tuning job. |
max |
Maximum number of trials to run in parallel. |
hparam |
The target evaluation metrics to optimize the hyperparameters for. |
decompose |
If true, perform decompose time series and save the results. |
clean |
If true, clean spikes and dips in the input time series. |
adjust |
If true, detect step changes and make data adjustment in the input time series. |
enable |
If true, enable global explanation during training. |
sampled |
Number of paths for the sampled Shapley explain method. |
integrated |
Number of integral steps for the integrated gradients explain method. |
category |
Categorical feature encoding method. |
tf |
Based on the selected TF version, the corresponding docker image is used to train external models. |
instance |
Name of the instance weight column for training data. This column isn't be used as a feature. |
trend |
Smoothing window size for the trend component. When a positive value is specified, a center moving average smoothing is applied on the history trend. When the smoothing window is out of the boundary at the beginning or the end of the trend, the first element or the last element is padded to fill the smoothing window before the average is applied. |
time |
The fraction of the interpolated length of the time series that's used to model the time series trend component. All of the time points of the time series are used to model the non-trend component. This training option accelerates modeling training without sacrificing much forecasting accuracy. You can use this option with |
min |
The minimum number of time points in a time series that are used in modeling the trend component of the time series. If you use this option you must also set the |
max |
The maximum number of time points in a time series that can be used in modeling the trend component of the time series. Don't use this option with the |
xgboost |
User-selected XGBoost versions for training of XGBoost models. |
approx |
Whether to use approximate feature contribution method in XGBoost model explanation for global explain. |
fit |
Whether the model should include intercept during model training. |
num |
Number of principal components to keep in the PCA model. Must be <= the number of features. |
pca |
The minimum ratio of cumulative explained variance that needs to be given by the PCA model. |
scale |
If true, scale the feature values by dividing the feature standard deviation. Currently only apply to PCA. |
pca |
The solver for PCA. |
auto |
Whether to calculate class weights automatically based on the popularity of each label. |
activation |
Activation function of the neural nets. |
optimizer |
Optimizer used for training the neural nets. |
budget |
Budget in hours for AutoML training. |
standardize |
Whether to standardize numerical features. Default to true. |
l1 |
L1 regularization coefficient to activations. |
model |
The model registry. |
vertex |
The version aliases to apply in Vertex AI model registry. Always overwrite if the version aliases exists in a existing model. |
dimension |
Optional. Names of the columns to slice on. Applies to contribution analysis models. |
contribution |
The contribution metric. Applies to contribution analysis models. Allowed formats supported are for summable and summable ratio contribution metrics. These include expressions such as |
is |
Name of the column used to determine the rows corresponding to control and test. Applies to contribution analysis models. |
min |
The apriori support minimum. Applies to contribution analysis models. |
LossType
Loss metric to evaluate model training performance.
Enums | |
---|---|
LOSS_TYPE_UNSPECIFIED |
Default value. |
MEAN_SQUARED_LOSS |
Mean squared loss, used for linear regression. |
MEAN_LOG_LOSS |
Mean log loss, used for logistic regression. |
DataSplitMethod
Indicates the method to split input data into multiple tables.
Enums | |
---|---|
DATA_SPLIT_METHOD_UNSPECIFIED |
Default value. |
RANDOM |
Splits data randomly. |
CUSTOM |
Splits data with the user provided tags. |
SEQUENTIAL |
Splits data sequentially. |
NO_SPLIT |
Data split will be skipped. |
AUTO_SPLIT |
Splits data automatically: Uses NO_SPLIT if the data size is small. Otherwise uses RANDOM. |
LearnRateStrategy
Indicates the learning rate optimization strategy to use.
Enums | |
---|---|
LEARN_RATE_STRATEGY_UNSPECIFIED |
Default value. |
LINE_SEARCH |
Use line search to determine learning rate. |
CONSTANT |
Use a constant learning rate. |
DistanceType
Distance metric used to compute the distance between two points.
Enums | |
---|---|
DISTANCE_TYPE_UNSPECIFIED |
Default value. |
EUCLIDEAN |
Eculidean distance. |
COSINE |
Cosine distance. |
OptimizationStrategy
Indicates the optimization strategy used for training.
Enums | |
---|---|
OPTIMIZATION_STRATEGY_UNSPECIFIED |
Default value. |
BATCH_GRADIENT_DESCENT |
Uses an iterative batch gradient descent algorithm. |
NORMAL_EQUATION |
Uses a normal equation to solve linear regression problem. |
BoosterType
Booster types supported. Refer to booster parameter in XGBoost.
Enums | |
---|---|
BOOSTER_TYPE_UNSPECIFIED |
Unspecified booster type. |
GBTREE |
Gbtree booster. |
DART |
Dart booster. |
DartNormalizeType
Type of normalization algorithm for boosted tree models using dart booster. Refer to normalize_type in XGBoost.
Enums | |
---|---|
DART_NORMALIZE_TYPE_UNSPECIFIED |
Unspecified dart normalize type. |
TREE |
New trees have the same weight of each of dropped trees. |
FOREST |
New trees have the same weight of sum of dropped trees. |
TreeMethod
Tree construction algorithm used in boosted tree models. Refer to treeMethod in XGBoost.
Enums | |
---|---|
TREE_METHOD_UNSPECIFIED |
Unspecified tree method. |
AUTO |
Use heuristic to choose the fastest method. |
EXACT |
Exact greedy algorithm. |
APPROX |
Approximate greedy algorithm using quantile sketch and gradient histogram. |
HIST |
Fast histogram optimized approximate greedy algorithm. |
FeedbackType
Indicates the training algorithm to use for matrix factorization models.
Enums | |
---|---|
FEEDBACK_TYPE_UNSPECIFIED |
Default value. |
IMPLICIT |
Use weighted-als for implicit feedback problems. |
EXPLICIT |
Use nonweighted-als for explicit feedback problems. |
KmeansInitializationMethod
Indicates the method used to initialize the centroids for KMeans clustering algorithm.
Enums | |
---|---|
KMEANS_INITIALIZATION_METHOD_UNSPECIFIED |
Unspecified initialization method. |
RANDOM |
Initializes the centroids randomly. |
CUSTOM |
Initializes the centroids using data specified in kmeansInitializationColumn. |
KMEANS_PLUS_PLUS |
Initializes with kmeans++. |
ArimaOrder
Arima order, can be used for both non-seasonal and seasonal parts.
JSON representation |
---|
{ "p": string, "d": string, "q": string } |
Fields | |
---|---|
p |
Order of the autoregressive part. |
d |
Order of the differencing part. |
q |
Order of the moving-average part. |
DataFrequency
Type of supported data frequency for time series forecasting models.
Enums | |
---|---|
DATA_FREQUENCY_UNSPECIFIED |
Default value. |
AUTO_FREQUENCY |
Automatically inferred from timestamps. |
YEARLY |
Yearly data. |
QUARTERLY |
Quarterly data. |
MONTHLY |
Monthly data. |
WEEKLY |
Weekly data. |
DAILY |
Daily data. |
HOURLY |
Hourly data. |
PER_MINUTE |
Per-minute data. |
HolidayRegion
Type of supported holiday regions for time series forecasting models.
Enums | |
---|---|
HOLIDAY_REGION_UNSPECIFIED |
Holiday region unspecified. |
GLOBAL |
Global. |
NA |
North America. |
JAPAC |
Japan and Asia Pacific: Korea, Greater China, India, Australia, and New Zealand. |
EMEA |
Europe, the Middle East and Africa. |
LAC |
Latin America and the Caribbean. |
AE |
United Arab Emirates |
AR |
Argentina |
AT |
Austria |
AU |
Australia |
BE |
Belgium |
BR |
Brazil |
CA |
Canada |
CH |
Switzerland |
CL |
Chile |
CN |
China |
CO |
Colombia |
CS |
Czechoslovakia |
CZ |
Czech Republic |
DE |
Germany |
DK |
Denmark |
DZ |
Algeria |
EC |
Ecuador |
EE |
Estonia |
EG |
Egypt |
ES |
Spain |
FI |
Finland |
FR |
France |
GB |
Great Britain (United Kingdom) |
GR |
Greece |
HK |
Hong Kong |
HU |
Hungary |
ID |
Indonesia |
IE |
Ireland |
IL |
Israel |
IN |
India |
IR |
Iran |
IT |
Italy |
JP |
Japan |
KR |
Korea (South) |
LV |
Latvia |
MA |
Morocco |
MX |
Mexico |
MY |
Malaysia |
NG |
Nigeria |
NL |
Netherlands |
NO |
Norway |
NZ |
New Zealand |
PE |
Peru |
PH |
Philippines |
PK |
Pakistan |
PL |
Poland |
PT |
Portugal |
RO |
Romania |
RS |
Serbia |
RU |
Russian Federation |
SA |
Saudi Arabia |
SE |
Sweden |
SG |
Singapore |
SI |
Slovenia |
SK |
Slovakia |
TH |
Thailand |
TR |
Turkey |
TW |
Taiwan |
UA |
Ukraine |
US |
United States |
VE |
Venezuela |
VN |
Viet Nam |
ZA |
South Africa |
HparamTuningObjective
Available evaluation metrics used as hyperparameter tuning objectives.
Enums | |
---|---|
HPARAM_TUNING_OBJECTIVE_UNSPECIFIED |
Unspecified evaluation metric. |
MEAN_ABSOLUTE_ERROR |
Mean absolute error. meanAbsoluteError = AVG(ABS(label - predicted)) |
MEAN_SQUARED_ERROR |
Mean squared error. meanSquaredError = AVG(POW(label - predicted, 2)) |
MEAN_SQUARED_LOG_ERROR |
Mean squared log error. meanSquaredLogError = AVG(POW(LN(1 + label) - LN(1 + predicted), 2)) |
MEDIAN_ABSOLUTE_ERROR |
Mean absolute error. medianAbsoluteError = APPROX_QUANTILES(absolute_error, 2)[OFFSET(1)] |
R_SQUARED |
R^2 score. This corresponds to r2_score in ML.EVALUATE. rSquared = 1 - SUM(squared_error)/(COUNT(label)*VAR_POP(label)) |
EXPLAINED_VARIANCE |
Explained variance. explainedVariance = 1 - VAR_POP(label_error)/VAR_POP(label) |
PRECISION |
Precision is the fraction of actual positive predictions that had positive actual labels. For multiclass this is a macro-averaged metric treating each class as a binary classifier. |
RECALL |
Recall is the fraction of actual positive labels that were given a positive prediction. For multiclass this is a macro-averaged metric. |
ACCURACY |
Accuracy is the fraction of predictions given the correct label. For multiclass this is a globally micro-averaged metric. |
F1_SCORE |
The F1 score is an average of recall and precision. For multiclass this is a macro-averaged metric. |
LOG_LOSS |
Logorithmic Loss. For multiclass this is a macro-averaged metric. |
ROC_AUC |
Area Under an ROC Curve. For multiclass this is a macro-averaged metric. |
DAVIES_BOULDIN_INDEX |
Davies-Bouldin Index. |
MEAN_AVERAGE_PRECISION |
Mean Average Precision. |
NORMALIZED_DISCOUNTED_CUMULATIVE_GAIN |
Normalized Discounted Cumulative Gain. |
AVERAGE_RANK |
Average Rank. |
EncodingMethod
Supported encoding methods for categorical features.
Enums | |
---|---|
ENCODING_METHOD_UNSPECIFIED |
Unspecified encoding method. |
ONE_HOT_ENCODING |
Applies one-hot encoding. |
LABEL_ENCODING |
Applies label encoding. |
DUMMY_ENCODING |
Applies dummy encoding. |
PcaSolver
Enums for supported PCA solvers.
Enums | |
---|---|
UNSPECIFIED |
Default value. |
FULL |
Full eigen-decoposition. |
RANDOMIZED |
Randomized SVD. |
AUTO |
Auto. |
ModelRegistry
Enums for supported model registries.
Enums | |
---|---|
MODEL_REGISTRY_UNSPECIFIED |
Default value. |
VERTEX_AI |
Vertex AI. |
IterationResult
Information about a single iteration of the training run.
JSON representation |
---|
{ "index": integer, "durationMs": string, "trainingLoss": number, "evalLoss": number, "learnRate": number, "clusterInfos": [ { object ( |
Fields | |
---|---|
index |
Index of the iteration, 0 based. |
duration |
Time taken to run the iteration in milliseconds. |
training |
Loss computed on the training data at the end of iteration. |
eval |
Loss computed on the eval data at the end of iteration. |
learn |
Learn rate used for this iteration. |
cluster |
Information about top clusters for clustering models. |
arima |
Arima result. |
principal |
The information of the principal components. |
ClusterInfo
Information about a single cluster for clustering model.
JSON representation |
---|
{ "centroidId": string, "clusterRadius": number, "clusterSize": string } |
Fields | |
---|---|
centroid |
Centroid id. |
cluster |
Cluster radius, the average distance from centroid to each point assigned to the cluster. |
cluster |
Cluster size, the total number of points assigned to the cluster. |
ArimaResult
(Auto-)arima fitting result. Wrap everything in ArimaResult for easier refactoring if we want to use model-specific iteration results.
JSON representation |
---|
{ "arimaModelInfo": [ { object ( |
Fields | |
---|---|
arima |
This message is repeated because there are multiple arima models fitted in auto-arima. For non-auto-arima model, its size is one. |
seasonal |
Seasonal periods. Repeated because multiple periods are supported for one time series. |
ArimaModelInfo
Arima model information.
JSON representation |
---|
{ "nonSeasonalOrder": { object ( |
Fields | |
---|---|
non |
Non-seasonal order. |
arima |
Arima coefficients. |
arima |
Arima fitting metrics. |
has |
Whether Arima model fitted with drift or not. It is always false when d is not 1. |
time |
The timeSeriesId value for this time series. It will be one of the unique values from the timeSeriesIdColumn specified during ARIMA model training. Only present when timeSeriesIdColumn training option was used. |
time |
The tuple of timeSeriesIds identifying this time series. It will be one of the unique tuples of values present in the timeSeriesIdColumns specified during ARIMA model training. Only present when timeSeriesIdColumns training option was used and the order of values here are same as the order of timeSeriesIdColumns. |
seasonal |
Seasonal periods. Repeated because multiple periods are supported for one time series. |
has |
If true, holiday_effect is a part of time series decomposition result. |
has |
If true, spikes_and_dips is a part of time series decomposition result. |
has |
If true, step_changes is a part of time series decomposition result. |
ArimaCoefficients
Arima coefficients.
JSON representation |
---|
{ "autoRegressiveCoefficients": [ number ], "movingAverageCoefficients": [ number ], "interceptCoefficient": number } |
Fields | |
---|---|
auto |
Auto-regressive coefficients, an array of double. |
moving |
Moving-average coefficients, an array of double. |
intercept |
Intercept coefficient, just a double not an array. |
ArimaFittingMetrics
ARIMA model fitting metrics.
JSON representation |
---|
{ "logLikelihood": number, "aic": number, "variance": number } |
Fields | |
---|---|
log |
Log-likelihood. |
aic |
AIC. |
variance |
Variance. |
SeasonalPeriodType
Seasonal period type.
Enums | |
---|---|
SEASONAL_PERIOD_TYPE_UNSPECIFIED |
Unspecified seasonal period. |
NO_SEASONALITY |
No seasonality |
DAILY |
Daily period, 24 hours. |
WEEKLY |
Weekly period, 7 days. |
MONTHLY |
Monthly period, 30 days or irregular. |
QUARTERLY |
Quarterly period, 90 days or irregular. |
YEARLY |
Yearly period, 365 days or irregular. |
PrincipalComponentInfo
Principal component infos, used only for eigen decomposition based models, e.g., PCA. Ordered by explainedVariance in the descending order.
JSON representation |
---|
{ "principalComponentId": string, "explainedVariance": number, "explainedVarianceRatio": number, "cumulativeExplainedVarianceRatio": number } |
Fields | |
---|---|
principal |
Id of the principal component. |
explained |
Explained variance by this principal component, which is simply the eigenvalue. |
explained |
Explained_variance over the total explained variance. |
cumulative |
The explainedVariance is pre-ordered in the descending order to compute the cumulative explained variance ratio. |
EvaluationMetrics
Evaluation metrics of a model. These are either computed on all training data or just the eval data based on whether eval data was used during training. These are not present for imported models.
JSON representation |
---|
{ // Union field |
Fields | |
---|---|
Union field metrics . Metrics. metrics can be only one of the following: |
|
regression |
Populated for regression models and explicit feedback type matrix factorization models. |
binary |
Populated for binary classification/classifier models. |
multi |
Populated for multi-class classification/classifier models. |
clustering |
Populated for clustering models. |
ranking |
Populated for implicit feedback type matrix factorization models. |
arima |
Populated for ARIMA models. |
dimensionality |
Evaluation metrics when the model is a dimensionality reduction model, which currently includes PCA. |
RegressionMetrics
Evaluation metrics for regression and explicit feedback type matrix factorization models.
JSON representation |
---|
{ "meanAbsoluteError": number, "meanSquaredError": number, "meanSquaredLogError": number, "medianAbsoluteError": number, "rSquared": number } |
Fields | |
---|---|
mean |
Mean absolute error. |
mean |
Mean squared error. |
mean |
Mean squared log error. |
median |
Median absolute error. |
r |
R^2 score. This corresponds to r2_score in ML.EVALUATE. |
BinaryClassificationMetrics
Evaluation metrics for binary classification/classifier models.
JSON representation |
---|
{ "aggregateClassificationMetrics": { object ( |
Fields | |
---|---|
aggregate |
Aggregate classification metrics. |
binary |
Binary confusion matrix at multiple thresholds. |
positive |
Label representing the positive class. |
negative |
Label representing the negative class. |
AggregateClassificationMetrics
Aggregate metrics for classification/classifier models. For multi-class models, the metrics are either macro-averaged or micro-averaged. When macro-averaged, the metrics are calculated for each label and then an unweighted average is taken of those values. When micro-averaged, the metric is calculated globally by counting the total number of correctly predicted rows.
JSON representation |
---|
{ "precision": number, "recall": number, "accuracy": number, "threshold": number, "f1Score": number, "logLoss": number, "rocAuc": number } |
Fields | |
---|---|
precision |
Precision is the fraction of actual positive predictions that had positive actual labels. For multiclass this is a macro-averaged metric treating each class as a binary classifier. |
recall |
Recall is the fraction of actual positive labels that were given a positive prediction. For multiclass this is a macro-averaged metric. |
accuracy |
Accuracy is the fraction of predictions given the correct label. For multiclass this is a micro-averaged metric. |
threshold |
Threshold at which the metrics are computed. For binary classification models this is the positive class threshold. For multi-class classfication models this is the confidence threshold. |
f1 |
The F1 score is an average of recall and precision. For multiclass this is a macro-averaged metric. |
log |
Logarithmic Loss. For multiclass this is a macro-averaged metric. |
roc |
Area Under a ROC Curve. For multiclass this is a macro-averaged metric. |
BinaryConfusionMatrix
Confusion matrix for binary classification models.
JSON representation |
---|
{ "positiveClassThreshold": number, "truePositives": string, "falsePositives": string, "trueNegatives": string, "falseNegatives": string, "precision": number, "recall": number, "f1Score": number, "accuracy": number } |
Fields | |
---|---|
positive |
Threshold value used when computing each of the following metric. |
true |
Number of true samples predicted as true. |
false |
Number of false samples predicted as true. |
true |
Number of true samples predicted as false. |
false |
Number of false samples predicted as false. |
precision |
The fraction of actual positive predictions that had positive actual labels. |
recall |
The fraction of actual positive labels that were given a positive prediction. |
f1 |
The equally weighted average of recall and precision. |
accuracy |
The fraction of predictions given the correct label. |
MultiClassClassificationMetrics
Evaluation metrics for multi-class classification/classifier models.
JSON representation |
---|
{ "aggregateClassificationMetrics": { object ( |
Fields | |
---|---|
aggregate |
Aggregate classification metrics. |
confusion |
Confusion matrix at different thresholds. |
ConfusionMatrix
Confusion matrix for multi-class classification models.
JSON representation |
---|
{
"confidenceThreshold": number,
"rows": [
{
object ( |
Fields | |
---|---|
confidence |
Confidence threshold used when computing the entries of the confusion matrix. |
rows[] |
One row per actual label. |
Row
A single row in the confusion matrix.
JSON representation |
---|
{
"actualLabel": string,
"entries": [
{
object ( |
Fields | |
---|---|
actual |
The original label of this row. |
entries[] |
Info describing predicted label distribution. |
Entry
A single entry in the confusion matrix.
JSON representation |
---|
{ "predictedLabel": string, "itemCount": string } |
Fields | |
---|---|
predicted |
The predicted label. For confidenceThreshold > 0, we will also add an entry indicating the number of items under the confidence threshold. |
item |
Number of items being predicted as this label. |
ClusteringMetrics
Evaluation metrics for clustering models.
JSON representation |
---|
{
"daviesBouldinIndex": number,
"meanSquaredDistance": number,
"clusters": [
{
object ( |
Fields | |
---|---|
davies |
Davies-Bouldin index. |
mean |
Mean of squared distances between each sample to its cluster centroid. |
clusters[] |
Information for all clusters. |
Cluster
Message containing the information about one cluster.
JSON representation |
---|
{
"centroidId": string,
"featureValues": [
{
object ( |
Fields | |
---|---|
centroid |
Centroid id. |
feature |
Values of highly variant features for this cluster. |
count |
Count of training data rows that were assigned to this cluster. |
FeatureValue
Representative value of a single feature within the cluster.
JSON representation |
---|
{ "featureColumn": string, // Union field |
Fields | |
---|---|
feature |
The feature column name. |
Union field value . Value. value can be only one of the following: |
|
numerical |
The numerical feature value. This is the centroid value for this feature. |
categorical |
The categorical feature value. |
CategoricalValue
Representative value of a categorical feature.
JSON representation |
---|
{
"categoryCounts": [
{
object ( |
Fields | |
---|---|
category |
Counts of all categories for the categorical feature. If there are more than ten categories, we return top ten (by count) and return one more CategoryCount with category "_OTHER_" and count as aggregate counts of remaining categories. |
CategoryCount
Represents the count of a single category within the cluster.
JSON representation |
---|
{ "category": string, "count": string } |
Fields | |
---|---|
category |
The name of category. |
count |
The count of training samples matching the category within the cluster. |
RankingMetrics
Evaluation metrics used by weighted-ALS models specified by feedbackType=implicit.
JSON representation |
---|
{ "meanAveragePrecision": number, "meanSquaredError": number, "normalizedDiscountedCumulativeGain": number, "averageRank": number } |
Fields | |
---|---|
mean |
Calculates a precision per user for all the items by ranking them and then averages all the precisions across all the users. |
mean |
Similar to the mean squared error computed in regression and explicit recommendation models except instead of computing the rating directly, the output from evaluate is computed against a preference which is 1 or 0 depending on if the rating exists or not. |
normalized |
A metric to determine the goodness of a ranking calculated from the predicted confidence by comparing it to an ideal rank measured by the original ratings. |
average |
Determines the goodness of a ranking by computing the percentile rank from the predicted confidence and dividing it by the original rank. |
ArimaForecastingMetrics
Model evaluation metrics for ARIMA forecasting models.
JSON representation |
---|
{ "nonSeasonalOrder": [ { object ( |
Fields | |
---|---|
nonSeasonalOrder[] |
Non-seasonal order. |
arimaFittingMetrics[] |
Arima model fitting metrics. |
seasonalPeriods[] |
Seasonal periods. Repeated because multiple periods are supported for one time series. |
hasDrift[] |
Whether Arima model fitted with drift or not. It is always false when d is not 1. |
timeSeriesId[] |
Id to differentiate different time series for the large-scale case. |
arima |
Repeated as there can be many metric sets (one for each model) in auto-arima and the large-scale case. |
ArimaSingleModelForecastingMetrics
Model evaluation metrics for a single ARIMA forecasting model.
JSON representation |
---|
{ "nonSeasonalOrder": { object ( |
Fields | |
---|---|
non |
Non-seasonal order. |
arima |
Arima fitting metrics. |
has |
Is arima model fitted with drift or not. It is always false when d is not 1. |
time |
The timeSeriesId value for this time series. It will be one of the unique values from the timeSeriesIdColumn specified during ARIMA model training. Only present when timeSeriesIdColumn training option was used. |
time |
The tuple of timeSeriesIds identifying this time series. It will be one of the unique tuples of values present in the timeSeriesIdColumns specified during ARIMA model training. Only present when timeSeriesIdColumns training option was used and the order of values here are same as the order of timeSeriesIdColumns. |
seasonal |
Seasonal periods. Repeated because multiple periods are supported for one time series. |
has |
If true, holiday_effect is a part of time series decomposition result. |
has |
If true, spikes_and_dips is a part of time series decomposition result. |
has |
If true, step_changes is a part of time series decomposition result. |
DimensionalityReductionMetrics
Model evaluation metrics for dimensionality reduction models.
JSON representation |
---|
{ "totalExplainedVarianceRatio": number } |
Fields | |
---|---|
total |
Total percentage of variance explained by the selected principal components. |
DataSplitResult
Data split result. This contains references to the training and evaluation data tables that were used to train the model.
JSON representation |
---|
{ "trainingTable": { object ( |
Fields | |
---|---|
training |
Table reference of the training data after split. |
evaluation |
Table reference of the evaluation data after split. |
test |
Table reference of the test data after split. |
GlobalExplanation
Global explanations containing the top most important features after training.
JSON representation |
---|
{
"explanations": [
{
object ( |
Fields | |
---|---|
explanations[] |
A list of the top global explanations. Sorted by absolute value of attribution in descending order. |
class |
Class label for this set of global explanations. Will be empty/null for binary logistic and linear regression models. Sorted alphabetically in descending order. |
Explanation
Explanation for a single feature.
JSON representation |
---|
{ "featureName": string, "attribution": number } |
Fields | |
---|---|
feature |
The full feature name. For non-numerical features, will be formatted like |
attribution |
Attribution of feature. |
TransformColumn
Information about a single transform column.
JSON representation |
---|
{
"name": string,
"type": {
object ( |
Fields | |
---|---|
name |
Output only. Name of the column. |
type |
Output only. Data type of the column after the transform. |
transform |
Output only. The SQL expression used in the column transform. |
HparamSearchSpaces
Hyperparameter search spaces. These should be a subset of trainingOptions.
JSON representation |
---|
{ "learnRate": { object ( |
Fields | |
---|---|
learn |
Learning rate of training jobs. |
l1 |
L1 regularization coefficient. |
l2 |
L2 regularization coefficient. |
num |
Number of clusters for k-means. |
num |
Number of latent factors to train on. |
batch |
Mini batch sample size. |
dropout |
Dropout probability for dnn model training and boosted tree models using dart booster. |
max |
Maximum depth of a tree for boosted tree models. |
subsample |
Subsample the training data to grow tree to prevent overfitting for boosted tree models. |
min |
Minimum split loss for boosted tree models. |
wals |
Hyperparameter for matrix factoration when implicit feedback type is specified. |
booster |
Booster type for boosted tree models. |
num |
Number of parallel trees for boosted tree models. |
dart |
Dart normalization type for boosted tree models. |
tree |
Tree construction algorithm for boosted tree models. |
min |
Minimum sum of instance weight needed in a child for boosted tree models. |
colsample |
Subsample ratio of columns when constructing each tree for boosted tree models. |
colsample |
Subsample ratio of columns for each level for boosted tree models. |
colsample |
Subsample ratio of columns for each node(split) for boosted tree models. |
activation |
Activation functions of neural network models. |
optimizer |
Optimizer of TF models. |
DoubleHparamSearchSpace
Search space for a double hyperparameter.
JSON representation |
---|
{ // Union field |
Fields | |
---|---|
Union field search_space . Search space. search_space can be only one of the following: |
|
range |
Range of the double hyperparameter. |
candidates |
Candidates of the double hyperparameter. |
DoubleRange
Range of a double hyperparameter.
JSON representation |
---|
{ "min": number, "max": number } |
Fields | |
---|---|
min |
Min value of the double parameter. |
max |
Max value of the double parameter. |
DoubleCandidates
Discrete candidates of a double hyperparameter.
JSON representation |
---|
{ "candidates": [ number ] } |
Fields | |
---|---|
candidates[] |
Candidates for the double parameter in increasing order. |
IntHparamSearchSpace
Search space for an int hyperparameter.
JSON representation |
---|
{ // Union field |
Fields | |
---|---|
Union field search_space . Search space. search_space can be only one of the following: |
|
range |
Range of the int hyperparameter. |
candidates |
Candidates of the int hyperparameter. |
IntRange
Range of an int hyperparameter.
JSON representation |
---|
{ "min": string, "max": string } |
Fields | |
---|---|
min |
Min value of the int parameter. |
max |
Max value of the int parameter. |
IntCandidates
Discrete candidates of an int hyperparameter.
JSON representation |
---|
{ "candidates": [ string ] } |
Fields | |
---|---|
candidates[] |
Candidates for the int parameter in increasing order. |
IntArrayHparamSearchSpace
Search space for int array.
JSON representation |
---|
{
"candidates": [
{
object ( |
Fields | |
---|---|
candidates[] |
Candidates for the int array parameter. |
IntArray
An array of int.
JSON representation |
---|
{ "elements": [ string ] } |
Fields | |
---|---|
elements[] |
Elements in the int array. |
StringHparamSearchSpace
Search space for string and enum.
JSON representation |
---|
{ "candidates": [ string ] } |
Fields | |
---|---|
candidates[] |
Canididates for the string or enum parameter in lower case. |
HparamTuningTrial
Training info of a trial in hyperparameter tuning models.
JSON representation |
---|
{ "trialId": string, "startTimeMs": string, "endTimeMs": string, "hparams": { object ( |
Fields | |
---|---|
trial |
1-based index of the trial. |
start |
Starting time of the trial. |
end |
Ending time of the trial. |
hparams |
The hyperprameters selected for this trial. |
evaluation |
Evaluation metrics of this trial calculated on the test data. Empty in Job API. |
status |
The status of the trial. |
error |
Error message for FAILED and INFEASIBLE trial. |
training |
Loss computed on the training data at the end of trial. |
eval |
Loss computed on the eval data at the end of trial. |
hparam |
Hyperparameter tuning evaluation metrics of this trial calculated on the eval data. Unlike evaluationMetrics, only the fields corresponding to the hparamTuningObjectives are set. |
TrialStatus
Current status of the trial.
Enums | |
---|---|
TRIAL_STATUS_UNSPECIFIED |
Default value. |
NOT_STARTED |
Scheduled but not started. |
RUNNING |
Running state. |
SUCCEEDED |
The trial succeeded. |
FAILED |
The trial failed. |
INFEASIBLE |
The trial is infeasible due to the invalid params. |
STOPPED_EARLY |
Trial stopped early because it's not promising. |
RemoteModelInfo
Remote Model Info
JSON representation |
---|
{ "connection": string, "maxBatchingRows": string, "remoteModelVersion": string, // Union field |
Fields | |
---|---|
connection |
Output only. Fully qualified name of the user-provided connection object of the remote model. Format: |
max |
Output only. Max number of rows in each batch sent to the remote service. If unset, the number of rows in each batch is set dynamically. |
remote |
Output only. The model version for LLM. |
Union field remote_service . Remote services are services outside of BigQuery used by remote models for predictions. A remote service is backed by either an arbitrary endpoint or a selected remote service type, but not both. remote_service can be only one of the following: |
|
endpoint |
Output only. The endpoint for remote model. |
remote |
Output only. The remote service type for remote model. |
RemoteServiceType
Supported service type for remote model.
Enums | |
---|---|
REMOTE_SERVICE_TYPE_UNSPECIFIED |
Unspecified remote service type. |
CLOUD_AI_TRANSLATE_V3 |
V3 Cloud AI Translation API. See more details at Cloud Translation API. |
CLOUD_AI_VISION_V1 |
V1 Cloud AI Vision API See more details at Cloud Vision API. |
CLOUD_AI_NATURAL_LANGUAGE_V1 |
V1 Cloud AI Natural Language API. See more details at REST Resource: documents. |
Methods |
|
---|---|
|
Deletes the model specified by modelId from the dataset. |
|
Gets the specified model resource by model ID. |
|
Lists all models in the specified dataset. |
|
Patch specific fields in the specified model. |