REST Resource: projects.locations.endpoints

Resource: Endpoint

Models are deployed into it, and afterwards Endpoint is called to obtain predictions and explanations.

JSON representation
{
  "name": string,
  "displayName": string,
  "description": string,
  "deployedModels": [
    {
      object (DeployedModel)
    }
  ],
  "trafficSplit": {
    string: integer,
    ...
  },
  "etag": string,
  "labels": {
    string: string,
    ...
  },
  "createTime": string,
  "updateTime": string,
  "encryptionSpec": {
    object (EncryptionSpec)
  },
  "modelDeploymentMonitoringJob": string
}
Fields
name

string

Output only. The resource name of the Endpoint.

displayName

string

Required. The display name of the Endpoint. The name can be up to 128 characters long and can be consist of any UTF-8 characters.

description

string

The description of the Endpoint.

deployedModels[]

object (DeployedModel)

Output only. The models deployed in this Endpoint. To add or remove DeployedModels use EndpointService.DeployModel and EndpointService.UndeployModel respectively.

trafficSplit

map (key: string, value: integer)

A map from a DeployedModel's ID to the percentage of this Endpoint's traffic that should be forwarded to that DeployedModel.

If a DeployedModel's ID is not listed in this map, then it receives no traffic.

The traffic percentage values must add up to 100, or map must be empty if the Endpoint is to not accept any traffic at a moment.

etag

string

Used to perform consistent read-modify-write updates. If not set, a blind "overwrite" update happens.

labels

map (key: string, value: string)

The labels with user-defined metadata to organize your Endpoints.

Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed.

See https://goo.gl/xmQnxf for more information and examples of labels.

createTime

string (Timestamp format)

Output only. Timestamp when this Endpoint was created.

A timestamp in RFC3339 UTC "Zulu" format, with nanosecond resolution and up to nine fractional digits. Examples: "2014-10-02T15:01:23Z" and "2014-10-02T15:01:23.045123456Z".

updateTime

string (Timestamp format)

Output only. Timestamp when this Endpoint was last updated.

A timestamp in RFC3339 UTC "Zulu" format, with nanosecond resolution and up to nine fractional digits. Examples: "2014-10-02T15:01:23Z" and "2014-10-02T15:01:23.045123456Z".

encryptionSpec

object (EncryptionSpec)

Customer-managed encryption key spec for an Endpoint. If set, this Endpoint and all sub-resources of this Endpoint will be secured by this key.

modelDeploymentMonitoringJob

string

Output only. Resource name of the Model Monitoring job associated with this Endpoint if monitoring is enabled by [modelDeploymentMonitoringJobs.create][]. Format: projects/{project}/locations/{location}/modelDeploymentMonitoringJobs/{modelDeploymentMonitoringJob}

DeployedModel

A deployment of a Model. Endpoints contain one or more DeployedModels.

JSON representation
{
  "id": string,
  "model": string,
  "displayName": string,
  "createTime": string,
  "explanationSpec": {
    object (ExplanationSpec)
  },
  "serviceAccount": string,
  "disableContainerLogging": boolean,
  "enableAccessLogging": boolean,

  // Union field prediction_resources can be only one of the following:
  "dedicatedResources": {
    object (DedicatedResources)
  },
  "automaticResources": {
    object (AutomaticResources)
  }
  // End of list of possible types for union field prediction_resources.
}
Fields
id

string

Output only. The ID of the DeployedModel.

model

string

Required. The name of the Model that this is the deployment of. Note that the Model may be in a different location than the DeployedModel's Endpoint.

displayName

string

The display name of the DeployedModel. If not provided upon creation, the Model's displayName is used.

createTime

string (Timestamp format)

Output only. Timestamp when the DeployedModel was created.

A timestamp in RFC3339 UTC "Zulu" format, with nanosecond resolution and up to nine fractional digits. Examples: "2014-10-02T15:01:23Z" and "2014-10-02T15:01:23.045123456Z".

explanationSpec

object (ExplanationSpec)

Explanation configuration for this DeployedModel.

When deploying a Model using EndpointService.DeployModel, this value overrides the value of Model.explanation_spec. All fields of explanationSpec are optional in the request. If a field of explanationSpec is not populated, the value of the same field of Model.explanation_spec is inherited. If the corresponding Model.explanation_spec is not populated, all fields of the explanationSpec will be used for the explanation configuration.

serviceAccount

string

The service account that the DeployedModel's container runs as. Specify the email address of the service account. If this service account is not specified, the container runs as a service account that doesn't have access to the resource project.

Users deploying the Model must have the iam.serviceAccounts.actAs permission on this service account.

disableContainerLogging

boolean

For custom-trained Models and AutoML Tabular Models, the container of the DeployedModel instances will send stderr and stdout streams to Stackdriver Logging by default. Please note that the logs incur cost, which are subject to Cloud Logging pricing.

User can disable container logging by setting this flag to true.

enableAccessLogging

boolean

These logs are like standard server access logs, containing information like timestamp and latency for each prediction request.

Note that Stackdriver logs may incur a cost, especially if your project receives prediction requests at a high queries per second rate (QPS). Estimate your costs before enabling this option.

Union field prediction_resources. The prediction (for example, the machine) resources that the DeployedModel uses. The user is billed for the resources (at least their minimal amount) even if the DeployedModel receives no traffic. Not all Models support all resources types. See Model.supported_deployment_resources_types. prediction_resources can be only one of the following:
dedicatedResources

object (DedicatedResources)

A description of resources that are dedicated to the DeployedModel, and that need a higher degree of manual configuration.

automaticResources

object (AutomaticResources)

A description of resources that to large degree are decided by Vertex AI, and require only a modest additional configuration.

DedicatedResources

A description of resources that are dedicated to a DeployedModel, and that need a higher degree of manual configuration.

JSON representation
{
  "machineSpec": {
    object (MachineSpec)
  },
  "minReplicaCount": integer,
  "maxReplicaCount": integer,
  "autoscalingMetricSpecs": [
    {
      object (AutoscalingMetricSpec)
    }
  ]
}
Fields
machineSpec

object (MachineSpec)

Required. Immutable. The specification of a single machine used by the prediction.

minReplicaCount

integer

Required. Immutable. The minimum number of machine replicas this DeployedModel will be always deployed on. This value must be greater than or equal to 1.

If traffic against the DeployedModel increases, it may dynamically be deployed onto more replicas, and as traffic decreases, some of these extra replicas may be freed.

maxReplicaCount

integer

Immutable. The maximum number of replicas this DeployedModel may be deployed on when the traffic against it increases. If the requested value is too large, the deployment will error, but if deployment succeeds then the ability to scale the model to that many replicas is guaranteed (barring service outages). If traffic against the DeployedModel increases beyond what its replicas at maximum may handle, a portion of the traffic will be dropped. If this value is not provided, will use minReplicaCount as the default value.

autoscalingMetricSpecs[]

object (AutoscalingMetricSpec)

Immutable. The metric specifications that overrides a resource utilization metric (CPU utilization, accelerator's duty cycle, and so on) target value (default to 60 if not set). At most one entry is allowed per metric.

If machineSpec.accelerator_count is above 0, the autoscaling will be based on both CPU utilization and accelerator's duty cycle metrics and scale up when either metrics exceeds its target value while scale down if both metrics are under their target value. The default target value is 60 for both metrics.

If machineSpec.accelerator_count is 0, the autoscaling will be based on CPU utilization metric only with default target value 60 if not explicitly set.

For example, in the case of Online Prediction, if you want to override target CPU utilization to 80, you should set autoscalingMetricSpecs.metric_name to aiplatform.googleapis.com/prediction/online/cpu/utilization and autoscalingMetricSpecs.target to 80.

AutoscalingMetricSpec

The metric specification that defines the target resource utilization (CPU utilization, accelerator's duty cycle, and so on) for calculating the desired replica count.

JSON representation
{
  "metricName": string,
  "target": integer
}
Fields
metricName

string

Required. The resource metric name. Supported metrics:

  • For Online Prediction:
  • aiplatform.googleapis.com/prediction/online/accelerator/duty_cycle
  • aiplatform.googleapis.com/prediction/online/cpu/utilization
target

integer

The target resource utilization in percentage (1% - 100%) for the given metric; once the real usage deviates from the target by a certain percentage, the machine replicas change. The default value is 60 (representing 60%) if not provided.

Methods

create

Creates an Endpoint.

delete

Deletes an Endpoint.

deployModel

Deploys a Model into this Endpoint, creating a DeployedModel within it.

explain

Perform an online explanation.

get

Gets an Endpoint.

list

Lists Endpoints in a Location.

patch

Updates an Endpoint.

predict

Perform an online prediction.

rawPredict

Perform an online prediction with arbitrary http payload.

undeployModel

Undeploys a Model from an Endpoint, removing a DeployedModel from it, and freeing all resources it's using.