Resource: Endpoint
Models are deployed into it, and afterwards Endpoint is called to obtain predictions and explanations.
JSON representation |
---|
{ "name": string, "displayName": string, "description": string, "deployedModels": [ { object ( |
Fields | |
---|---|
name |
Output only. The resource name of the Endpoint. |
displayName |
Required. The display name of the Endpoint. The name can be up to 128 characters long and can consist of any UTF-8 characters. |
description |
The description of the Endpoint. |
deployedModels[] |
Output only. The models deployed in this Endpoint. To add or remove DeployedModels use |
trafficSplit |
A map from a DeployedModel's ID to the percentage of this Endpoint's traffic that should be forwarded to that DeployedModel. If a DeployedModel's ID is not listed in this map, then it receives no traffic. The traffic percentage values must add up to 100, or map must be empty if the Endpoint is to not accept any traffic at a moment. |
etag |
Used to perform consistent read-modify-write updates. If not set, a blind "overwrite" update happens. |
labels |
The labels with user-defined metadata to organize your Endpoints. label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. See https://goo.gl/xmQnxf for more information and examples of labels. |
createTime |
Output only. timestamp when this Endpoint was created. A timestamp in RFC3339 UTC "Zulu" format, with nanosecond resolution and up to nine fractional digits. Examples: |
updateTime |
Output only. timestamp when this Endpoint was last updated. A timestamp in RFC3339 UTC "Zulu" format, with nanosecond resolution and up to nine fractional digits. Examples: |
encryptionSpec |
Customer-managed encryption key spec for an Endpoint. If set, this Endpoint and all sub-resources of this Endpoint will be secured by this key. |
network |
Optional. The full name of the Google Compute Engine network to which the Endpoint should be peered. Private services access must already be configured for the network. If left unspecified, the Endpoint is not peered with any network. Only one of the fields, Format: |
enablePrivateServiceConnect |
Deprecated: If true, expose the Endpoint via private service connect. Only one of the fields, |
privateServiceConnectConfig |
Optional. Configuration for private service connect.
|
modelDeploymentMonitoringJob |
Output only. Resource name of the Model Monitoring job associated with this Endpoint if monitoring is enabled by |
predictRequestResponseLoggingConfig |
Configures the request-response logging for online prediction. |
DeployedModel
A deployment of a Model. Endpoints contain one or more DeployedModels.
JSON representation |
---|
{ "id": string, "model": string, "modelVersionId": string, "displayName": string, "createTime": string, "explanationSpec": { object ( |
Fields | |
---|---|
id |
Immutable. The ID of the DeployedModel. If not provided upon deployment, Vertex AI will generate a value for this ID. This value should be 1-10 characters, and valid characters are |
model |
Required. The resource name of the Model that this is the deployment of. Note that the Model may be in a different location than the DeployedModel's Endpoint. The resource name may contain version id or version alias to specify the version. Example: |
modelVersionId |
Output only. The version ID of the model that is deployed. |
displayName |
The display name of the DeployedModel. If not provided upon creation, the Model's displayName is used. |
createTime |
Output only. timestamp when the DeployedModel was created. A timestamp in RFC3339 UTC "Zulu" format, with nanosecond resolution and up to nine fractional digits. Examples: |
explanationSpec |
Explanation configuration for this DeployedModel. When deploying a Model using |
disableExplanations |
If true, deploy the model without explainable feature, regardless the existence of |
serviceAccount |
The service account that the DeployedModel's container runs as. Specify the email address of the service account. If this service account is not specified, the container runs as a service account that doesn't have access to the resource project. Users deploying the Model must have the |
disableContainerLogging |
For custom-trained Models and AutoML Tabular Models, the container of the DeployedModel instances will send user can disable container logging by setting this flag to true. |
enableAccessLogging |
If true, online prediction access logs are sent to Cloud Logging. These logs are like standard server access logs, containing information like timestamp and latency for each prediction request. Note that logs may incur a cost, especially if your project receives prediction requests at a high queries per second rate (QPS). Estimate your costs before enabling this option. |
privateEndpoints |
Output only. Provide paths for users to send predict/explain/health requests directly to the deployed model services running on Cloud via private services access. This field is populated if |
Union field prediction_resources . The prediction (for example, the machine) resources that the DeployedModel uses. The user is billed for the resources (at least their minimal amount) even if the DeployedModel receives no traffic. Not all Models support all resources types. See Model.supported_deployment_resources_types . Required except for Large Model Deploy use cases. prediction_resources can be only one of the following: |
|
dedicatedResources |
A description of resources that are dedicated to the DeployedModel, and that need a higher degree of manual configuration. |
automaticResources |
A description of resources that to large degree are decided by Vertex AI, and require only a modest additional configuration. |
sharedResources |
The resource name of the shared DeploymentResourcePool to deploy on. Format: |
PrivateEndpoints
PrivateEndpoints proto is used to provide paths for users to send requests privately. To send request via private service access, use predictHttpUri, explainHttpUri or healthHttpUri. To send request via private service connect, use serviceAttachment.
JSON representation |
---|
{ "predictHttpUri": string, "explainHttpUri": string, "healthHttpUri": string, "serviceAttachment": string } |
Fields | |
---|---|
predictHttpUri |
Output only. Http(s) path to send prediction requests. |
explainHttpUri |
Output only. Http(s) path to send explain requests. |
healthHttpUri |
Output only. Http(s) path to send health check requests. |
serviceAttachment |
Output only. The name of the service attachment resource. Populated if private service connect is enabled. |
PredictRequestResponseLoggingConfig
Configuration for logging request-response to a BigQuery table.
JSON representation |
---|
{
"enabled": boolean,
"samplingRate": number,
"bigqueryDestination": {
object ( |
Fields | |
---|---|
enabled |
If logging is enabled or not. |
samplingRate |
Percentage of requests to be logged, expressed as a fraction in range(0,1]. |
bigqueryDestination |
BigQuery table for logging. If only given a project, a new dataset will be created with name |
Methods |
|
---|---|
|
Return a list of tokens based on the input text. |
|
Perform a token counting. |
|
Creates an Endpoint. |
|
Deletes an Endpoint. |
|
Deploys a Model into this Endpoint, creating a DeployedModel within it. |
|
Perform an unary online prediction request to a gRPC model server for Vertex first-party products and frameworks. |
|
Perform an unary online prediction request to a gRPC model server for custom containers. |
|
Perform an online explanation. |
|
Generate content with multimodal inputs. |
|
Gets an Endpoint. |
|
Lists Endpoints in a Location. |
|
Updates an existing deployed model. |
|
Updates an Endpoint. |
|
Perform an online prediction. |
|
Perform an online prediction with an arbitrary HTTP payload. |
|
Perform a server-side streaming online prediction request for Vertex LLM streaming. |
|
Generate content with multimodal inputs with streaming support. |
|
Perform a streaming online prediction with an arbitrary HTTP payload. |
|
Undeploys a Model from an Endpoint, removing a DeployedModel from it, and freeing all resources it's using. |