REST Resource: projects.locations.endpoints

Resource: Endpoint

Models are deployed into it, and afterwards Endpoint is called to obtain predictions and explanations.

JSON representation
{
  "name": string,
  "displayName": string,
  "description": string,
  "deployedModels": [
    {
      object (DeployedModel)
    }
  ],
  "trafficSplit": {
    string: integer,
    ...
  },
  "etag": string,
  "labels": {
    string: string,
    ...
  },
  "createTime": string,
  "updateTime": string,
  "encryptionSpec": {
    object (EncryptionSpec)
  },
  "network": string,
  "enablePrivateServiceConnect": boolean,
  "modelDeploymentMonitoringJob": string,
  "predictRequestResponseLoggingConfig": {
    object (PredictRequestResponseLoggingConfig)
  }
}
Fields
name

string

Output only. The resource name of the Endpoint.

displayName

string

Required. The display name of the Endpoint. The name can be up to 128 characters long and can consist of any UTF-8 characters.

description

string

The description of the Endpoint.

deployedModels[]

object (DeployedModel)

Output only. The models deployed in this Endpoint. To add or remove DeployedModels use EndpointService.DeployModel and EndpointService.UndeployModel respectively.

trafficSplit

map (key: string, value: integer)

A map from a DeployedModel's ID to the percentage of this Endpoint's traffic that should be forwarded to that DeployedModel.

If a DeployedModel's ID is not listed in this map, then it receives no traffic.

The traffic percentage values must add up to 100, or map must be empty if the Endpoint is to not accept any traffic at a moment.

etag

string

Used to perform consistent read-modify-write updates. If not set, a blind "overwrite" update happens.

labels

map (key: string, value: string)

The labels with user-defined metadata to organize your Endpoints.

label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed.

See https://goo.gl/xmQnxf for more information and examples of labels.

createTime

string (Timestamp format)

Output only. timestamp when this Endpoint was created.

A timestamp in RFC3339 UTC "Zulu" format, with nanosecond resolution and up to nine fractional digits. Examples: "2014-10-02T15:01:23Z" and "2014-10-02T15:01:23.045123456Z".

updateTime

string (Timestamp format)

Output only. timestamp when this Endpoint was last updated.

A timestamp in RFC3339 UTC "Zulu" format, with nanosecond resolution and up to nine fractional digits. Examples: "2014-10-02T15:01:23Z" and "2014-10-02T15:01:23.045123456Z".

encryptionSpec

object (EncryptionSpec)

Customer-managed encryption key spec for an Endpoint. If set, this Endpoint and all sub-resources of this Endpoint will be secured by this key.

network

string

Optional. The full name of the Google Compute Engine network to which the Endpoint should be peered.

Private services access must already be configured for the network. If left unspecified, the Endpoint is not peered with any network.

Only one of the fields, network or enablePrivateServiceConnect, can be set.

Format: projects/{project}/global/networks/{network}. Where {project} is a project number, as in 12345, and {network} is network name.

enablePrivateServiceConnect
(deprecated)

boolean

Deprecated: If true, expose the Endpoint via private service connect.

Only one of the fields, network or enablePrivateServiceConnect, can be set.

modelDeploymentMonitoringJob

string

Output only. Resource name of the Model Monitoring job associated with this Endpoint if monitoring is enabled by JobService.CreateModelDeploymentMonitoringJob. Format: projects/{project}/locations/{location}/modelDeploymentMonitoringJobs/{modelDeploymentMonitoringJob}

predictRequestResponseLoggingConfig

object (PredictRequestResponseLoggingConfig)

Configures the request-response logging for online prediction.

DeployedModel

A deployment of a Model. Endpoints contain one or more DeployedModels.

JSON representation
{
  "id": string,
  "model": string,
  "modelVersionId": string,
  "displayName": string,
  "createTime": string,
  "explanationSpec": {
    object (ExplanationSpec)
  },
  "disableExplanations": boolean,
  "serviceAccount": string,
  "enableContainerLogging": boolean,
  "enableAccessLogging": boolean,
  "privateEndpoints": {
    object (PrivateEndpoints)
  },

  // Union field prediction_resources can be only one of the following:
  "dedicatedResources": {
    object (DedicatedResources)
  },
  "automaticResources": {
    object (AutomaticResources)
  },
  "sharedResources": string
  // End of list of possible types for union field prediction_resources.
}
Fields
id

string

Immutable. The ID of the DeployedModel. If not provided upon deployment, Vertex AI will generate a value for this ID.

This value should be 1-10 characters, and valid characters are /[0-9]/.

model

string

Required. The resource name of the Model that this is the deployment of. Note that the Model may be in a different location than the DeployedModel's Endpoint.

The resource name may contain version id or version alias to specify the version. Example: projects/{project}/locations/{location}/models/{model}@2 or projects/{project}/locations/{location}/models/{model}@golden if no version is specified, the default version will be deployed.

modelVersionId

string

Output only. The version ID of the model that is deployed.

displayName

string

The display name of the DeployedModel. If not provided upon creation, the Model's displayName is used.

createTime

string (Timestamp format)

Output only. timestamp when the DeployedModel was created.

A timestamp in RFC3339 UTC "Zulu" format, with nanosecond resolution and up to nine fractional digits. Examples: "2014-10-02T15:01:23Z" and "2014-10-02T15:01:23.045123456Z".

explanationSpec

object (ExplanationSpec)

Explanation configuration for this DeployedModel.

When deploying a Model using EndpointService.DeployModel, this value overrides the value of Model.explanation_spec. All fields of explanationSpec are optional in the request. If a field of explanationSpec is not populated, the value of the same field of Model.explanation_spec is inherited. If the corresponding Model.explanation_spec is not populated, all fields of the explanationSpec will be used for the explanation configuration.

disableExplanations

boolean

If true, deploy the model without explainable feature, regardless the existence of Model.explanation_spec or explanationSpec.

serviceAccount

string

The service account that the DeployedModel's container runs as. Specify the email address of the service account. If this service account is not specified, the container runs as a service account that doesn't have access to the resource project.

Users deploying the Model must have the iam.serviceAccounts.actAs permission on this service account.

enableContainerLogging

boolean

If true, the container of the DeployedModel instances will send stderr and stdout streams to Cloud Logging.

Only supported for custom-trained Models and AutoML Tabular Models.

enableAccessLogging

boolean

If true, online prediction access logs are sent to Cloud Logging. These logs are like standard server access logs, containing information like timestamp and latency for each prediction request.

Note that logs may incur a cost, especially if your project receives prediction requests at a high queries per second rate (QPS). Estimate your costs before enabling this option.

privateEndpoints

object (PrivateEndpoints)

Output only. Provide paths for users to send predict/explain/health requests directly to the deployed model services running on Cloud via private services access. This field is populated if network is configured.

Union field prediction_resources. The prediction (for example, the machine) resources that the DeployedModel uses. The user is billed for the resources (at least their minimal amount) even if the DeployedModel receives no traffic. Not all Models support all resources types. See Model.supported_deployment_resources_types. Required except for Large Model Deploy use cases. prediction_resources can be only one of the following:
dedicatedResources

object (DedicatedResources)

A description of resources that are dedicated to the DeployedModel, and that need a higher degree of manual configuration.

automaticResources

object (AutomaticResources)

A description of resources that to large degree are decided by Vertex AI, and require only a modest additional configuration.

sharedResources

string

The resource name of the shared DeploymentResourcePool to deploy on. Format: projects/{project}/locations/{location}/deploymentResourcePools/{deploymentResourcePool}

PrivateEndpoints

PrivateEndpoints proto is used to provide paths for users to send requests privately. To send request via private service access, use predictHttpUri, explainHttpUri or healthHttpUri. To send request via private service connect, use serviceAttachment.

JSON representation
{
  "predictHttpUri": string,
  "explainHttpUri": string,
  "healthHttpUri": string,
  "serviceAttachment": string
}
Fields
predictHttpUri

string

Output only. Http(s) path to send prediction requests.

explainHttpUri

string

Output only. Http(s) path to send explain requests.

healthHttpUri

string

Output only. Http(s) path to send health check requests.

serviceAttachment

string

Output only. The name of the service attachment resource. Populated if private service connect is enabled.

PredictRequestResponseLoggingConfig

Configuration for logging request-response to a BigQuery table.

JSON representation
{
  "enabled": boolean,
  "samplingRate": number,
  "bigqueryDestination": {
    object (BigQueryDestination)
  }
}
Fields
enabled

boolean

If logging is enabled or not.

samplingRate

number

Percentage of requests to be logged, expressed as a fraction in range(0,1].

bigqueryDestination

object (BigQueryDestination)

BigQuery table for logging. If only given a project, a new dataset will be created with name logging_<endpoint-display-name>_<endpoint-id> where will be made BigQuery-dataset-name compatible (e.g. most special characters will become underscores). If no table name is given, a new table will be created with name request_response_logging

Methods

countTokens

Perform a token counting.

create

Creates an Endpoint.

delete

Deletes an Endpoint.

deployModel

Deploys a Model into this Endpoint, creating a DeployedModel within it.

explain

Perform an online explanation.

generateContent

Generate content with multimodal inputs.

get

Gets an Endpoint.

getIamPolicy

Gets the access control policy for a resource.

list

Lists Endpoints in a Location.

mutateDeployedModel

Updates an existing deployed model.

patch

Updates an Endpoint.

predict

Perform an online prediction.

rawPredict

Perform an online prediction with an arbitrary HTTP payload.

serverStreamingPredict

Perform a server-side streaming online prediction request for Vertex LLM streaming.

setIamPolicy

Sets the access control policy on the specified resource.

streamGenerateContent

Generate content with multimodal inputs with streaming support.

testIamPermissions

Returns permissions that a caller has on the specified resource.

undeployModel

Undeploys a Model from an Endpoint, removing a DeployedModel from it, and freeing all resources it's using.