Class Endpoint (1.5.0)

Endpoint(
    endpoint_name: str,
    project: Optional[str] = None,
    location: Optional[str] = None,
    credentials: Optional[google.auth.credentials.Credentials] = None,
)

Retrieves an endpoint resource.

Parameters

Name	Description
endpoint_name	`str` Required. A fully-qualified endpoint resource name or endpoint ID. Example: "projects/123/locations/us-central1/endpoints/456" or "456" when project and location are initialized or passed.
project	`str` Optional. Project to retrieve endpoint from. If not set, project set in aiplatform.init will be used.
location	`str` Optional. Location to retrieve endpoint from. If not set, location set in aiplatform.init will be used.
credentials	`auth_credentials.Credentials` Optional. Custom credentials to use to upload this model. Overrides credentials set in aiplatform.init.

Inheritance

builtins.object > google.cloud.aiplatform.base.VertexAiResourceNoun > builtins.object > google.cloud.aiplatform.base.FutureManager > google.cloud.aiplatform.base.VertexAiResourceNounWithFutureManager > Endpoint

Properties

network

The full name of the Google Compute Engine network to which this Endpoint should be peered.

Takes the format projects/{project}/global/networks/{network}. Where {project} is a project number, as in 12345, and {network} is a network name.

Private services access must already be configured for the network. If left unspecified, the Endpoint is not peered with any network.

traffic_split

A map from a DeployedModel's ID to the percentage of this Endpoint's traffic that should be forwarded to that DeployedModel.

If a DeployedModel's ID is not listed in this map, then it receives no traffic.

The traffic percentage values must add up to 100, or map must be empty if the Endpoint is to not accept any traffic at a moment.

Methods

create

create(
    display_name: str,
    description: Optional[str] = None,
    labels: Optional[Dict[str, str]] = None,
    metadata: Optional[Sequence[Tuple[str, str]]] = (),
    project: Optional[str] = None,
    location: Optional[str] = None,
    credentials: Optional[google.auth.credentials.Credentials] = None,
    encryption_spec_key_name: Optional[str] = None,
    sync=True,
)

Creates a new endpoint.

Parameters

Name	Description
display_name	`str` Required. The user-defined name of the Endpoint. The name can be up to 128 characters long and can be consist of any UTF-8 characters.
description	`str` Optional. The description of the Endpoint.
labels	`Dict[str, str]` Optional. The labels with user-defined metadata to organize your Endpoints. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. See https://goo.gl/xmQnxf for more information and examples of labels.
metadata	`Sequence[Tuple[str, str]]` Optional. Strings which should be sent along with the request as metadata.
project	`str` Required. Project to retrieve endpoint from. If not set, project set in aiplatform.init will be used.
location	`str` Required. Location to retrieve endpoint from. If not set, location set in aiplatform.init will be used.
credentials	`auth_credentials.Credentials` Optional. Custom credentials to use to upload this model. Overrides credentials set in aiplatform.init.
encryption_spec_key_name	`Optional[str]` Optional. The Cloud KMS resource identifier of the customer managed encryption key used to protect the model. Has the form: `projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key`. The key needs to be in the same region as where the compute resource is created. If set, this Endpoint and all sub-resources of this Endpoint will be secured by this key. Overrides encryption_spec_key_name set in aiplatform.init.
sync	`bool` Whether to execute this method synchronously. If False, this method will be executed in concurrent Future and any downstream object will be immediately returned and synced when the Future has completed.

Returns

Type	Description
endpoint (endpoint.Endpoint)	Created endpoint.

delete

delete(force: bool = False, sync: bool = True)

Deletes this Vertex AI Endpoint resource. If force is set to True, all models on this Endpoint will be undeployed prior to deletion.

Parameters

Name	Description
force	`bool` Required. If force is set to True, all deployed models on this Endpoint will be undeployed first. Default is False.
sync	`bool` Whether to execute this method synchronously. If False, this method will be executed in concurrent Future and any downstream object will be immediately returned and synced when the Future has completed.

Exceptions

Type	Description
FailedPrecondition	If models are deployed on this Endpoint and force = False.

deploy

deploy(
    model: google.cloud.aiplatform.models.Model,
    deployed_model_display_name: Optional[str] = None,
    traffic_percentage: int = 0,
    traffic_split: Optional[Dict[str, int]] = None,
    machine_type: Optional[str] = None,
    min_replica_count: int = 1,
    max_replica_count: int = 1,
    accelerator_type: Optional[str] = None,
    accelerator_count: Optional[int] = None,
    service_account: Optional[str] = None,
    explanation_metadata: Optional[
        google.cloud.aiplatform_v1.types.explanation_metadata.ExplanationMetadata
    ] = None,
    explanation_parameters: Optional[
        google.cloud.aiplatform_v1.types.explanation.ExplanationParameters
    ] = None,
    metadata: Optional[Sequence[Tuple[str, str]]] = (),
    sync=True,
)

Deploys a Model to the Endpoint.

Parameters

Name	Description
deployed_model_display_name	`str` Optional. The display name of the DeployedModel. If not provided upon creation, the Model's display_name is used.
traffic_percentage	`int` Optional. Desired traffic to newly deployed model. Defaults to 0 if there are pre-existing deployed models. Defaults to 100 if there are no pre-existing deployed models. Negative values should not be provided. Traffic of previously deployed models at the endpoint will be scaled down to accommodate new deployed model's traffic. Should not be provided if traffic_split is provided.
traffic_split	`Dict[str, int]` Optional. A map from a DeployedModel's ID to the percentage of this Endpoint's traffic that should be forwarded to that DeployedModel. If a DeployedModel's ID is not listed in this map, then it receives no traffic. The traffic percentage values must add up to 100, or map must be empty if the Endpoint is to not accept any traffic at the moment. Key for model being deployed is "0". Should not be provided if traffic_percentage is provided.
machine_type	`str` Optional. The type of machine. Not specifying machine type will result in model to be deployed with automatic resources.
min_replica_count	`int` Optional. The minimum number of machine replicas this deployed model will be always deployed on. If traffic against it increases, it may dynamically be deployed onto more replicas, and as traffic decreases, some of these extra replicas may be freed.
max_replica_count	`int` Optional. The maximum number of replicas this deployed model may be deployed on when the traffic against it increases. If requested value is too large, the deployment will error, but if deployment succeeds then the ability to scale the model to that many replicas is guaranteed (barring service outages). If traffic against the deployed model increases beyond what its replicas at maximum may handle, a portion of the traffic will be dropped. If this value is not provided, the larger value of min_replica_count or 1 will be used. If value provided is smaller than min_replica_count, it will automatically be increased to be min_replica_count.
accelerator_type	`str` Optional. Hardware accelerator type. Must also set accelerator_count if used. One of ACCELERATOR_TYPE_UNSPECIFIED, NVIDIA_TESLA_K80, NVIDIA_TESLA_P100, NVIDIA_TESLA_V100, NVIDIA_TESLA_P4, NVIDIA_TESLA_T4
accelerator_count	`int` Optional. The number of accelerators to attach to a worker replica.
service_account	`str` The service account that the DeployedModel's container runs as. Specify the email address of the service account. If this service account is not specified, the container runs as a service account that doesn't have access to the resource project. Users deploying the Model must have the `iam.serviceAccounts.actAs` permission on this service account.
explanation_metadata	`explain.ExplanationMetadata` Optional. Metadata describing the Model's input and output for explanation. Both `explanation_metadata` and `explanation_parameters` must be passed together when used. For more details, see `Ref docs <http://tinyurl.com/1igh60kt>`
explanation_parameters	`explain.ExplanationParameters` Optional. Parameters to configure explaining for Model's predictions. For more details, see `Ref docs <http://tinyurl.com/1an4zake>`
metadata	`Sequence[Tuple[str, str]]` Optional. Strings which should be sent along with the request as metadata.
model	`aiplatform.Model` Required. Model to be deployed.
sync	`bool` Whether to execute this method synchronously. If False, this method will be executed in concurrent Future and any downstream object will be immediately returned and synced when the Future has completed.

explain

explain(
    instances: List[Dict],
    parameters: Optional[Dict] = None,
    deployed_model_id: Optional[str] = None,
)

Make a prediction with explanations against this Endpoint.

Example usage: response = my_endpoint.explain(instances=[...]) my_explanations = response.explanations

Parameters

Name	Description
instances	`List` Required. The instances that are the input to the prediction call. A DeployedModel may have an upper limit on the number of instances it supports per request, and when it is exceeded the prediction call errors in case of AutoML Models, or, in case of customer created Models, the behaviour is as documented by that Model. The schema of any single instance may be specified via Endpoint's DeployedModels' [Model's][google.cloud.aiplatform.v1beta1.DeployedModel.model] [PredictSchemata's][google.cloud.aiplatform.v1beta1.Model.predict_schemata] `instance_schema_uri`.
parameters	`Dict` The parameters that govern the prediction. The schema of the parameters may be specified via Endpoint's DeployedModels' [Model's ][google.cloud.aiplatform.v1beta1.DeployedModel.model] [PredictSchemata's][google.cloud.aiplatform.v1beta1.Model.predict_schemata] `parameters_schema_uri`.
deployed_model_id	`str` Optional. If specified, this ExplainRequest will be served by the chosen DeployedModel, overriding this Endpoint's traffic split.

Returns

Type	Description
prediction	Prediction with returned predictions, explanations and Model Id.

list

list(
    filter: Optional[str] = None,
    order_by: Optional[str] = None,
    project: Optional[str] = None,
    location: Optional[str] = None,
    credentials: Optional[google.auth.credentials.Credentials] = None,
)

List all Endpoint resource instances.

Example Usage:

aiplatform.Endpoint.list( filter='labels.my_label="my_label_value" OR display_name=!"old_endpoint"', )

Parameters

Name	Description
filter	`str` Optional. An expression for filtering the results of the request. For field names both snake_case and camelCase are supported.
order_by	`str` Optional. A comma-separated list of fields to order by, sorted in ascending order. Use "desc" after a field name for descending. Supported fields: `display_name`, `create_time`, `update_time`
project	`str` Optional. Project to retrieve list from. If not set, project set in aiplatform.init will be used.
location	`str` Optional. Location to retrieve list from. If not set, location set in aiplatform.init will be used.
credentials	`auth_credentials.Credentials` Optional. Custom credentials to use to retrieve list. Overrides credentials set in aiplatform.init.

list_models

list_models()

Returns a list of the models deployed to this Endpoint.

Returns

Type	Description
deployed_models (Sequence[aiplatform.gapic.DeployedModel])	A list of the models deployed in this Endpoint.

predict

predict(instances: List, parameters: Optional[Dict] = None)

Make a prediction against this Endpoint.

Parameters

Name	Description
instances	`List` Required. The instances that are the input to the prediction call. A DeployedModel may have an upper limit on the number of instances it supports per request, and when it is exceeded the prediction call errors in case of AutoML Models, or, in case of customer created Models, the behaviour is as documented by that Model. The schema of any single instance may be specified via Endpoint's DeployedModels' [Model's][google.cloud.aiplatform.v1beta1.DeployedModel.model] [PredictSchemata's][google.cloud.aiplatform.v1beta1.Model.predict_schemata] `instance_schema_uri`.
parameters	`Dict` The parameters that govern the prediction. The schema of the parameters may be specified via Endpoint's DeployedModels' [Model's ][google.cloud.aiplatform.v1beta1.DeployedModel.model] [PredictSchemata's][google.cloud.aiplatform.v1beta1.Model.predict_schemata] `parameters_schema_uri`.

Returns

Type	Description
prediction	Prediction with returned predictions and Model Id.

undeploy

undeploy(
    deployed_model_id: str,
    traffic_split: Optional[Dict[str, int]] = None,
    metadata: Optional[Sequence[Tuple[str, str]]] = (),
    sync=True,
)

Undeploys a deployed model.

Proportionally adjusts the traffic_split among the remaining deployed models of the endpoint.

Parameters

Name	Description
deployed_model_id	`str` Required. The ID of the DeployedModel to be undeployed from the Endpoint.
traffic_split	`Dict[str, int]` Optional. A map from a DeployedModel's ID to the percentage of this Endpoint's traffic that should be forwarded to that DeployedModel. If a DeployedModel's ID is not listed in this map, then it receives no traffic. The traffic percentage values must add up to 100, or map must be empty if the Endpoint is to not accept any traffic at the moment. Key for model being deployed is "0". Should not be provided if traffic_percentage is provided.
metadata	`Sequence[Tuple[str, str]]` Optional. Strings which should be sent along with the request as metadata.

undeploy_all

undeploy_all(sync: bool = True)

Undeploys every model deployed to this Endpoint.

Parameter

Name	Description
sync	`bool` Whether to execute this method synchronously. If False, this method will be executed in concurrent Future and any downstream object will be immediately returned and synced when the Future has completed.