REST Resource: projects.models.versions

Resource: Version

Represents a version of the model.

Each version is a trained model deployed in the cloud, ready to handle prediction requests. A model can have multiple versions. You can get information about all of the versions of a given model by calling projects.models.versions.list.

JSON representation
  "name": string,
  "description": string,
  "isDefault": boolean,
  "deploymentUri": string,
  "createTime": string,
  "lastUseTime": string,
  "runtimeVersion": string,
  "machineType": string,
  "state": enum (State),
  "errorMessage": string,
  "packageUris": [
  "labels": {
    string: string,
  "etag": string,
  "framework": enum (Framework),
  "pythonVersion": string,
  "acceleratorConfig": {
    object (AcceleratorConfig)
  "serviceAccount": string,
  "requestLoggingConfig": {
    object (RequestLoggingConfig)
  "explanationConfig": {
    object (ExplanationConfig)

  // Union field scaling can be only one of the following:
  "autoScaling": {
    object (AutoScaling)
  "manualScaling": {
    object (ManualScaling)
  // End of list of possible types for union field scaling.
  "predictionClass": string


Required. The name specified for the version when it was created.

The version name must be unique within the model it is created in.



Optional. The description specified for the version when it was created.



Output only. If true, this version will be used to handle prediction requests that do not specify a version.

You can change the default version by calling projects.methods.versions.setDefault.



Required. The Cloud Storage location of the trained model used to create the version. See the guide to model deployment for more information.

When passing Version to projects.models.versions.create the model service uses the specified location as the source of the model. Once deployed, the model version is hosted by the prediction service, so this location is useful only as a historical record. The total number of model files can't exceed 1000.


string (Timestamp format)

Output only. The time the version was created.


string (Timestamp format)

Output only. The time the version was last used for prediction.



Required. The AI Platform runtime version to use for this deployment.

For more information, see the runtime version list and how to manage runtime versions.



Optional. The type of machine on which to serve the model. Currently only applies to online prediction service. If this field is not specified, it defaults to mls1-c1-m2.

Online prediction supports the following machine types:

  • mls1-c1-m2
  • mls1-c4-m2
  • n1-standard-2
  • n1-standard-4
  • n1-standard-8
  • n1-standard-16
  • n1-standard-32
  • n1-highmem-2
  • n1-highmem-4
  • n1-highmem-8
  • n1-highmem-16
  • n1-highmem-32
  • n1-highcpu-2
  • n1-highcpu-4
  • n1-highcpu-8
  • n1-highcpu-16
  • n1-highcpu-32

mls1-c1-m2 is generally available. All other machine types are available in beta. Learn more about the differences between machine types.


enum (State)

Output only. The state of a version.



Output only. The details of a failure or a cancellation.



Optional. Cloud Storage paths (gs://…) of packages for custom prediction routines or scikit-learn pipelines with custom code.

For a custom prediction routine, one of these packages must contain your Predictor class (see predictionClass). Additionally, include any dependencies used by your Predictor or scikit-learn pipeline uses that are not already included in your selected runtime version.

If you specify this field, you must also set runtimeVersion to 1.4 or greater.


map (key: string, value: string)

Optional. One or more labels that you can add, to organize your model versions. Each label is a key-value pair, where both the key and the value are arbitrary strings that you supply. For more information, see the documentation on using labels.

An object containing a list of "key": value pairs. Example: { "name": "wrench", "mass": "1.3kg", "count": "3" }.


string (bytes format)

etag is used for optimistic concurrency control as a way to help prevent simultaneous updates of a model from overwriting each other. It is strongly suggested that systems make use of the etag in the read-modify-write cycle to perform model updates in order to avoid race conditions: An etag is returned in the response to versions.get, and systems are expected to put that etag in the request to versions.patch to ensure that their change will be applied to the model as intended.

A base64-encoded string.


enum (Framework)

Optional. The machine learning framework AI Platform uses to train this version of the model. Valid values are TENSORFLOW, SCIKIT_LEARN, XGBOOST. If you do not specify a framework, AI Platform will analyze files in the deploymentUri to determine a framework. If you choose SCIKIT_LEARN or XGBOOST, you must also set the runtime version of the model to 1.4 or greater.

Do not specify a framework if you're deploying a custom prediction routine.

If you specify a Compute Engine (N1) machine type in the machineType field, you must specify TENSORFLOW for the framework.



Required. The version of Python used in prediction.

The following Python versions are available:

  • Python '3.7' is available when runtimeVersion is set to '1.15' or later.
  • Python '3.5' is available when runtimeVersion is set to a version from '1.4' to '1.14'.
  • Python '2.7' is available when runtimeVersion is set to '1.15' or earlier.

Read more about the Python versions available for each runtime version.


object (AcceleratorConfig)

Optional. Accelerator config for using GPUs for online prediction (beta). Only specify this field if you have specified a Compute Engine (N1) machine type in the machineType field. Learn more about using GPUs for online prediction.



Optional. Specifies the service account for resource access control.


object (RequestLoggingConfig)

Optional. Only specify this field in a projects.models.versions.patch request. Specifying it in a projects.models.versions.create request has no effect.

Configures the request-response pair logging on predictions from this Version.


object (ExplanationConfig)

Optional. Configures explainability features on the model's version. Some explanation features require additional metadata to be loaded as part of the model payload.

Union field scaling. Optional. Sets the options for scaling. If not specified, defaults to auto_scaling with min_nodes of 0 (see doc for AutoScaling.min_nodes) scaling can be only one of the following:

object (AutoScaling)

Automatically scale the number of nodes used to serve the model in response to increases and decreases in traffic. Care should be taken to ramp up traffic according to the model's ability to scale or you will start seeing increases in latency and 429 response codes.

Note that you cannot use AutoScaling if your version uses GPUs. Instead, you must use specify manualScaling.


object (ManualScaling)

Manually select the number of nodes to use for serving the model. You should generally use autoScaling with an appropriate minNodes instead, but this option is available if you want more predictable billing. Beware that latency and error rates will increase if the traffic exceeds that capability of the system to serve it based on the selected number of nodes.



Optional. The fully qualified name (module_name.class_name) of a class that implements the Predictor interface described in this reference field. The module containing this class should be included in a package provided to the packageUris field.

Specify this field if and only if you are deploying a custom prediction routine (beta). If you specify this field, you must set runtimeVersion to 1.4 or greater and you must set machineType to a legacy (MLS1) machine type.

The following code sample provides the Predictor interface:

class Predictor(object):
"""Interface for constructing custom predictors."""

def predict(self, instances, **kwargs):
    """Performs custom prediction.

    Instances are the decoded values from the request. They have already
    been deserialized from JSON.

        instances: A list of prediction input instances.
        **kwargs: A dictionary of keyword args provided as additional
            fields on the predict request body.

        A list of outputs containing the prediction results. This list must
        be JSON serializable.
    raise NotImplementedError()

def from_path(cls, model_dir):
    """Creates an instance of Predictor using the given path.

    Loading of the predictor should be done in this method.

        model_dir: The local directory that contains the exported model
            file along with any additional files uploaded when creating the
            version resource.

        An instance implementing this Predictor class.
    raise NotImplementedError()

Learn more about the Predictor interface and custom prediction routines.


Options for automatically scaling a model.

JSON representation
  "minNodes": integer


Optional. The minimum number of nodes to allocate for this model. These nodes are always up, starting from the time the model is deployed. Therefore, the cost of operating this model will be at least rate * minNodes * number of hours since last billing cycle, where rate is the cost per node-hour as documented in the pricing guide, even if no predictions are performed. There is additional cost for each prediction performed.

Unlike manual scaling, if the load gets too heavy for the nodes that are up, the service will automatically add nodes to handle the increased load as well as scale back as traffic drops, always maintaining at least minNodes. You will be charged for the time in which additional nodes are used.

If minNodes is not specified and AutoScaling is used with a legacy (MLS1) machine type, minNodes defaults to 0, in which case, when traffic to a model stops (and after a cool-down period), nodes will be shut down and no charges will be incurred until traffic to the model resumes.

If minNodes is not specified and AutoScaling is used with a Compute Engine (N1) machine type, minNodes defaults to 1. minNodes must be at least 1 for use with a Compute Engine machine type.

Note that you cannot use AutoScaling if your version uses GPUs. Instead, you must use ManualScaling.

You can set minNodes when creating the model version, and you can also update minNodes for an existing version:

  'autoScaling': {
    'minNodes': 5

HTTP request:

-d @./update_body.json


Options for manually scaling a model.

JSON representation
  "nodes": integer


The number of nodes to allocate for this model. These nodes are always up, starting from the time the model is deployed, so the cost of operating this model will be proportional to nodes * number of hours since last billing cycle plus the cost for each prediction performed.


Describes the version state.

UNKNOWN The version state is unspecified.
READY The version is ready for prediction.
CREATING The version is being created. New versions.patch and versions.delete requests will fail if a version is in the CREATING state.
FAILED The version failed to be created, possibly cancelled. errorMessage should contain the details of the failure.
DELETING The version is being deleted. New versions.patch and versions.delete requests will fail if a version is in the DELETING state.
UPDATING The version is being updated. New versions.patch and versions.delete requests will fail if a version is in the UPDATING state.


Available frameworks for prediction.

FRAMEWORK_UNSPECIFIED Unspecified framework. Assigns a value based on the file suffix.
TENSORFLOW Tensorflow framework.
SCIKIT_LEARN Scikit-learn framework.
XGBOOST XGBoost framework.


Configuration for logging request-response pairs to a BigQuery table. Online prediction requests to a model version and the responses to these requests are converted to raw strings and saved to the specified BigQuery table. Logging is constrained by BigQuery quotas and limits. If your project exceeds BigQuery quotas or limits, AI Platform Prediction does not log request-response pairs, but it continues to serve predictions.

If you are using continuous evaluation, you do not need to specify this configuration manually. Setting up continuous evaluation automatically enables logging of request-response pairs.

JSON representation
  "samplingPercentage": number,
  "bigqueryTableName": string


Percentage of requests to be logged, expressed as a fraction from 0 to 1. For example, if you want to log 10% of requests, enter 0.1. The sampling window is the lifetime of the model version. Defaults to 0.



Required. Fully qualified BigQuery table name in the following format: "projectId.dataset_name.table_name"

The specified table must already exist, and the "Cloud ML Service Agent" for your project must have permission to write to it. The table must have the following schema:

Field nameType Mode


Message holding configuration options for explaining model predictions. There are two feature attribution methods supported for TensorFlow models: integrated gradients and sampled Shapley. Learn more about feature attributions.

JSON representation

  // Union field attribution_method can be only one of the following:
  "integratedGradientsAttribution": {
    object (IntegratedGradientsAttribution)
  "sampledShapleyAttribution": {
    object (SampledShapleyAttribution)
  // End of list of possible types for union field attribution_method.
Union field attribution_method. The attribution method to enable for explaining the model's predictions. attribution_method can be only one of the following:

object (IntegratedGradientsAttribution)


object (SampledShapleyAttribution)


Attributes credit by computing the Aumann-Shapley value taking advantage of the model's fully differentiable structure. Refer to this paper for more details:

JSON representation
  "numIntegralSteps": integer


Number of steps for approximating the path integral. A good value to start is 50 and gradually increase until the sum to diff property is met within the desired error range.


An attribution method that approximates Shapley values for features that contribute to the label being predicted. A sampling strategy is used to approximate the value rather than considering all subsets of features.

JSON representation
  "numPaths": integer


The number of feature permutations to consider when approximating the Shapley values.



Creates a new version of a model from a trained TensorFlow model.


Deletes a model version.


Gets information about a model version.


Gets basic information about all the versions of a model.


Updates the specified Version resource.


Designates a version to be the default for the model.