Choosing a machine type for online prediction

AI Platform Prediction allocates nodes to handle online prediction requests sent to a model version. When you deploy a model version, you can customize the type of virtual machine that AI Platform Prediction uses for these nodes.

Machine types differ in several ways:

By selecting a machine type with more computing resources, you can serve predictions with lower latency or handle more prediction requests at the same time.

Available machine types

The default machine type, mls1-c1-m2, is generally available for online prediction. You can alternatively deploy a model version with one of the other machine types, which are available in beta.

The following table compares the available machine types:

Name Availability vCPUs Memory (GB) Supports GPUs? ML framework support Max model size
mls1-c1-m2 (default) Generally available 1 2 No All types of model artifacts supported by AI Platform Prediction 500 MB
mls1-c4-m2 Beta 4 2 No All types of model artifacts supported by AI Platform Prediction 500 MB
n1-standard-2 Beta 2 7.5 Yes Only TensorFlow SavedModel 2 GB
n1-standard-4 Beta 4 15 Yes Only TensorFlow SavedModel 2 GB
n1-standard-8 Beta 8 30 Yes Only TensorFlow SavedModel 2 GB
n1-standard-16 Beta 16 60 Yes Only TensorFlow SavedModel 2 GB
n1-standard-32 Beta 32 120 Yes Only TensorFlow SavedModel 2 GB
n1-highmem-2 Beta 2 13 Yes Only TensorFlow SavedModel 2 GB
n1-highmem-4 Beta 4 26 Yes Only TensorFlow SavedModel 2 GB
n1-highmem-8 Beta 8 52 Yes Only TensorFlow SavedModel 2 GB
n1-highmem-16 Beta 16 104 Yes Only TensorFlow SavedModel 2 GB
n1-highmem-32 Beta 32 208 Yes Only TensorFlow SavedModel 2 GB
n1-highcpu-2 Beta 2 1.8 Yes Only TensorFlow SavedModel 2 GB
n1-highcpu-4 Beta 4 3.6 Yes Only TensorFlow SavedModel 2 GB
n1-highcpu-8 Beta 8 7.2 Yes Only TensorFlow SavedModel 2 GB
n1-highcpu-16 Beta 16 14.4 Yes Only TensorFlow SavedModel 2 GB
n1-highcpu-32 Beta 32 28.8 Yes Only TensorFlow SavedModel 2 GB

Learn about pricing for each machine type. Read more about the detailed specifications of Compute Engine (N1) machine types in the Compute Engine documentation.

Specifying a machine type

You can specify a machine type choice when you create a model version. If you don't specify a machine type, your model version defaults to using mls1-c1-m2 for its nodes.

The following instructions highlight how to specify a machine type when you create a model version. They use the mls1-c4-m2 machine type as an example. To learn about the full process of creating a model version, read the guide to deploying models.

GCP Console

On the Create version page, open the Machine type drop-down list and select AI Platform machine types > Quad Core CPU (BETA).

gcloud

After you have uploaded your model artifacts to Cloud Storage and created a model resource, you can create a model version that uses the mls1-c4-m2 machine type by using the beta component of the gcloud command-line tool:

gcloud beta ai-platform versions create version_name \
  --model model_name \
  --origin gs://model-directory-uri \
  --runtime-version 1.14 \
  --python-version 3.5 \
  --framework ml-framework-name \
  --machine-type mls1-c4-m2

Python

This example uses the the Google APIs Client Library for Python. Before you run the following code sample, you must set up authentication.

After you have uploaded your model artifacts to Cloud Storage and created a model resource, send a request to your model's projects.models.versions.create method and specify the machineType field in your request body:

from googleapiclient import discovery

ml = discovery.build('ml', 'v1')
request_dict = {
    'name': 'version_name',
    'deploymentUri': 'gs://model-directory-uri',
    'runtimeVersion': '1.14',
    'pythonVersion': '3.5',
    'framework': 'ML_FRAMEWORK_NAME',
    'machineType': 'mls1-c4-m2'
}
request = ml.projects().models().versions().create(
    parent='projects/project-name/models/model_name',
    body=request_dict
)
response = request.execute()

Using GPUs for online prediction

If you use one of the Compute Engine (N1) machine types for your model version, you can optionally add GPUs to accelerate each prediction node. You can only use one type of GPU for your model version, and there are limitations on the number of GPUs you can add depending on which machine type you are using.

The following table shows the GPUs available for online prediction and how many of each type of GPU you can use with each Compute Engine machine type:

Valid numbers of GPUs for each machine type
Machine type NVIDIA Tesla K80 NVIDIA Tesla P4 NVIDIA Tesla P100 NVIDIA Tesla T4 NVIDIA Tesla V100
n1-standard-2 1, 2, 4, 8 1, 2, 4 1, 2, 4 1, 2, 4 1, 2, 4, 8
n1-standard-4 1, 2, 4, 8 1, 2, 4 1, 2, 4 1, 2, 4 1, 2, 4, 8
n1-standard-8 1, 2, 4, 8 1, 2, 4 1, 2, 4 1, 2, 4 1, 2, 4, 8
n1-standard-16 2, 4, 8 1, 2, 4 1, 2, 4 1, 2, 4 2, 4, 8
n1-standard-32 4, 8 2, 4 2, 4 2, 4 4, 8
n1-highmem-2 1, 2, 4, 8 1, 2, 4 1, 2, 4 1, 2, 4 1, 2, 4, 8
n1-highmem-4 1, 2, 4, 8 1, 2, 4 1, 2, 4 1, 2, 4 1, 2, 4, 8
n1-highmem-8 1, 2, 4, 8 1, 2, 4 1, 2, 4 1, 2, 4 1, 2, 4, 8
n1-highmem-16 2, 4, 8 1, 2, 4 1, 2, 4 1, 2, 4 2, 4, 8
n1-highmem-32 4, 8 2, 4 2, 4 2, 4 4, 8
n1-highcpu-2 1, 2, 4, 8 1, 2, 4 1, 2, 4 1, 2, 4 1, 2, 4, 8
n1-highcpu-4 1, 2, 4, 8 1, 2, 4 1, 2, 4 1, 2, 4 1, 2, 4, 8
n1-highcpu-8 1, 2, 4, 8 1, 2, 4 1, 2, 4 1, 2, 4 1, 2, 4, 8
n1-highcpu-16 2, 4, 8 1, 2, 4 1, 2, 4 1, 2, 4 2, 4, 8
n1-highcpu-32 4, 8 2, 4 2, 4 2, 4 4, 8

GPUs are optional and incur additional costs. Legacy (MLS1) machine types do not support GPUs.

Specifying GPUs

Specify GPUs when you create a model version. AI Platform Prediction allocates the number and type of GPU that you specify for each prediction node. You must manually scale prediction nodes for your version when you use GPUS. You can later change how many nodes are running, but you cannot currently use automatic scaling with GPUs.

The following instructions show how to specify GPUs for online prediction by using an n1-highmem-32 machine type with 4 NVIDIA Tesla K80 GPUs for each of the model version's prediction nodes:

GCP Console

On the Create version page, open the Machine type drop-down list and select High-memory > n1-highmem-32. In the Accelerator type field select NVIDIA_TESLA_K80. In the Accelerator count field select 4.

gcloud

After you have uploaded your TensorFlow SavedModel to Cloud Storage and created a model resource in the us-central1 region, create a version by using the beta component of the gcloud command-line tool and specify the --accelerator flag:

gcloud beta ai-platform versions create version_name \
  --model model_name \
  --origin gs://model-directory-uri \
  --runtime-version 1.14 \
  --python-version 3.5 \
  --framework tensorflow \
  --machine-type n1-highmem-32 \
  --accelerator 4,nvidia-tesla-k80

Note that the accelerator name is specified in lowercase with hyphens between words.

Python

This example uses the the Google APIs Client Library for Python. Before you run the following code sample, you must set up authentication.

After you have uploaded your TensorFlow SavedModel to Cloud Storage and created a model resource in the us-central1 region, send a request to your model's projects.models.versions.create method and specify the machineType and acceleratorConfig fields in your request body:

from googleapiclient import discovery

ml = discovery.build('ml', 'v1')
request_dict = {
    'name': 'version_name',
    'deploymentUri': 'gs://model-directory-uri',
    'runtimeVersion': '1.14',
    'pythonVersion': '3.5',
    'framework': 'TENSORFLOW',
    'machineType': 'mls1-c4-m2',
    'acceleratorConfig': {
      'count': 4,
      'type': 'NVIDIA_TESLA_K80'
    }
}
request = ml.projects().models().versions().create(
    parent='projects/project-name/models/model_name',
    body=request_dict
)
response = request.execute()

Note that the accelerator name is specified in uppercase with underscores between words.

Differences between machine types

Besides providing different amounts of computing resources, machine types also vary in their support for certain AI Platform Prediction features. The following table provides an overview of the differences between Compute Engine (N1) machine types and legacy (MLS1) machine types:

Compute Engine (N1) machine types Legacy (MLS1) machine types
Regions us-central1 All AI Platform Prediction regions
Types of ML artifacts TensorFlow SavedModels All AI Platform model artifacts
Runtime versions 1.11 or later All available AI Platform runtime versions
Max model size 2 GB 500 MB
Logging No stream logging All types of logging
Auto scaling Minimum nodes = 1 Minimum nodes = 0
Manual scaling Can update number of nodes Cannot update number of nodes after creating model version

The following sections provide detailed explanations about the differences between machine types.

Regional availability

Compute Engine (N1) machine types are currently only available when you deploy your model in the us-central1 region.

You can use legacy (MLS1) machine types in all regions available for online prediction.

Batch prediction support

Model versions that use the mls1-c4-m2 machine type do not support batch prediction.

ML framework support

If you use one of the Compute Engine (N1) machine types, you must create your model version with a TensorFlow SavedModel and specify TENSORFLOW for the framework field.

For legacy (MLS1) machine types, you can use any type of exported machine learning model that AI Platform Prediction supports.

Runtime version support

If you use a Compute Engine (N1) machine types, you must use runtime version 1.11 or later for your model version.

If you use a legacy (MLS1) machine type, you can use any available AI Platform runtime version.

Max model size

The model artifacts that you provide when you create a model version must have a total file size less than 500 MB if you use a legacy (MLS1) machine type. The total file size can be up to 2 GB if you use a Compute Engine (N1) machine type.

Logging predictions

Compute Engine (N1) machine types do not support stream logging of prediction nodes' stderr and stdout streams.

Legacy (MLS1) machine types support all types of online prediction logging.

Scaling prediction nodes

Automatic scaling and manual scaling of prediction nodes both have different constraints depending on whether you use a Compute Engine (N1) machine type or a legacy (MLS1) machine type.

Automatic scaling

If you use a Compute Engine (N1) machine type with automatic scaling, your model version must always have at least one node running. In other words, the version's autoScaling.minNodes field defaults to 1 and cannot be less than 1.

If you use GPUs for your model version, you cannot use automatic scaling. You must use manual scaling.

If you use a legacy (MLS1) machine type, your model version can scale to zero nodes when it doesn't receive traffic. (autoScaling.minNodes can be set to 0, and it is set to 0 by default.)

Manual scaling

If you use a Compute Engine (N1) machine type with manual scaling, you can update the number of prediction nodes running at any time by using the projects.models.versions.patch API method.

If you use a legacy (MLS1) machine type with manual scaling, you cannot update the number of prediction nodes after you create the model version. If you want to change the number of nodes, you must delete the version and create a new one.

이 페이지가 도움이 되었나요? 평가를 부탁드립니다.

다음에 대한 의견 보내기...

도움이 필요하시나요? 지원 페이지를 방문하세요.