Configure compute resources for prediction

Vertex AI allocates nodes to handle online and batch predictions. When you deploy a custom-trained model or AutoML model to an Endpoint resource to serve online predictions or when you request batch predictions, you can customize the type of virtual machine that the prediction service uses for these nodes. You can optionally configure prediction nodes to use GPUs.

Machine types differ in a few ways:

  • Number of virtual CPUs (vCPUs) per node
  • Amount of memory per node
  • Pricing

By selecting a machine type with more computing resources, you can serve predictions with lower latency or handle more prediction requests at the same time.

Where to specify compute resources

Online prediction

If you want to use a custom-trained model or an AutoML tabular model to serve online predictions, you must specify a machine type when you deploy the Model resource as a DeployedModel to an Endpoint. For other types of AutoML models, Vertex AI configures the machine types automatically.

Specify the machine type (and, optionally, GPU configuration) in the dedicatedResources.machineSpec field of your DeployedModel.

Learn how to deploy each model type:

Batch prediction

If you want to get batch predictions from a custom-trained model or an AutoML tabular model, you must specify a machine type when you create a BatchPredictionJob resource. Specify the machine type (and, optionally, GPU configuration) in the dedicatedResources.machineSpec field of your BatchPredictionJob.

Machine types

The following table compares the available machine types for serving predictions from custom-trained models and AutoML tabular models:

E2 Series

Name vCPUs Memory (GB)
e2-standard-2 2 8
e2-standard-4 4 16
e2-standard-8 8 32
e2-standard-16 16 64
e2-standard-32 32 128
e2-highmem-2 2 16
e2-highmem-4 4 32
e2-highmem-8 8 64
e2-highmem-16 16 128
e2-highcpu-2 2 2
e2-highcpu-4 4 4
e2-highcpu-8 8 8
e2-highcpu-16 16 16
e2-highcpu-32 32 32

N1 Series

Name vCPUs Memory (GB)
n1-standard-2 2 7.5
n1-standard-4 4 15
n1-standard-8 8 30
n1-standard-16 16 60
n1-standard-32 32 120
n1-highmem-2 2 13
n1-highmem-4 4 26
n1-highmem-8 8 52
n1-highmem-16 16 104
n1-highmem-32 32 208
n1-highcpu-4 4 3.6
n1-highcpu-8 8 7.2
n1-highcpu-16 16 14.4
n1-highcpu-32 32 28.8

N2 Series

Name vCPUs Memory (GB)
n2-standard-2 2 8
n2-standard-4 4 16
n2-standard-8 8 32
n2-standard-16 16 64
n2-standard-32 32 128
n2-standard-48 48 192
n2-standard-64 64 256
n2-standard-80 80 320
n2-standard-96 96 384
n2-standard-128 128 512
n2-highmem-2 2 16
n2-highmem-4 4 32
n2-highmem-8 8 64
n2-highmem-16 16 128
n2-highmem-32 32 256
n2-highmem-48 48 384
n2-highmem-64 64 512
n2-highmem-80 80 640
n2-highmem-96 96 768
n2-highmem-128 128 864
n2-highcpu-2 2 2
n2-highcpu-4 4 4
n2-highcpu-8 8 8
n2-highcpu-16 16 16
n2-highcpu-32 32 32
n2-highcpu-48 48 48
n2-highcpu-64 64 64
n2-highcpu-80 80 80
n2-highcpu-96 96 96

N2D Series

Name vCPUs Memory (GB)
n2d-standard-2 2 8
n2d-standard-4 4 16
n2d-standard-8 8 32
n2d-standard-16 16 64
n2d-standard-32 32 128
n2d-standard-48 48 192
n2d-standard-64 64 256
n2d-standard-80 80 320
n2d-standard-96 96 384
n2d-standard-128 128 512
n2d-standard-224 224 896
n2d-highmem-2 2 16
n2d-highmem-4 4 32
n2d-highmem-8 8 64
n2d-highmem-16 16 128
n2d-highmem-32 32 256
n2d-highmem-48 48 384
n2d-highmem-64 64 512
n2d-highmem-80 80 640
n2d-highmem-96 96 768
n2d-highcpu-2 2 2
n2d-highcpu-4 4 4
n2d-highcpu-8 8 8
n2d-highcpu-16 16 16
n2d-highcpu-32 32 32
n2d-highcpu-48 48 48
n2d-highcpu-64 64 64
n2d-highcpu-80 80 80
n2d-highcpu-96 96 96
n2d-highcpu-128 128 128
n2d-highcpu-224 224 224

C2 Series

Name vCPUs Memory (GB)
c2-standard-4 4 16
c2-standard-8 8 32
c2-standard-16 16 64
c2-standard-30 30 120
c2-standard-60 60 240

C2D Series

Name vCPUs Memory (GB)
c2d-standard-2 2 8
c2d-standard-4 4 16
c2d-standard-8 8 32
c2d-standard-16 16 64
c2d-standard-32 32 128
c2d-standard-56 56 224
c2d-standard-112 112 448
c2d-highcpu-2 2 4
c2d-highcpu-4 4 8
c2d-highcpu-8 8 16
c2d-highcpu-16 16 32
c2d-highcpu-32 32 64
c2d-highcpu-56 56 112
c2d-highcpu-112 112 224
c2d-highmem-2 2 16
c2d-highmem-4 4 32
c2d-highmem-8 8 64
c2d-highmem-16 16 128
c2d-highmem-32 32 256
c2d-highmem-56 56 448
c2d-highmem-112 112 896

C3 Series

Name vCPUs Memory (GB)
c3-highcpu-4 4 8
c3-highcpu-8 8 16
c3-highcpu-22 22 44
c3-highcpu-44 44 88
c3-highcpu-88 88 176
c3-highcpu-176 176 352

A2 Series

Name vCPUs Memory (GB) GPUs (NVIDIA A100)
a2-highgpu-1g 12 85 1 (A100 40GB)
a2-highgpu-2g 24 170 2 (A100 40GB)
a2-highgpu-4g 48 340 4 (A100 40GB)
a2-highgpu-8g 96 680 8 (A100 40GB)
a2-megagpu-16g 96 1360 16 (A100 40GB)
a2-ultragpu-1g 12 170 1 (A100 80GB)
a2-ultragpu-2g 24 340 2 (A100 80GB)
a2-ultragpu-4g 48 680 4 (A100 80GB)
a2-ultragpu-8g 96 1360 8 (A100 80GB)

A3 Series

Name vCPUs Memory (GB) GPUs (NVIDIA H100)
a3-highgpu-8g 208 1872 8 (H100 80GB)

G2 Series

Name vCPUs Memory (GB) GPUs (NVIDIA L4)
g2-standard-4 4 16 1
g2-standard-8 8 32 1
g2-standard-12 12 48 1
g2-standard-16 16 64 1
g2-standard-24 24 96 2
g2-standard-32 32 128 1
g2-standard-48 48 192 4
g2-standard-96 96 384 8

Learn about pricing for each machine type. Read more about the detailed specifications of these machine types in the Compute Engine documentation about machine types.

Find the ideal machine type

Online prediction

To find the ideal machine type for your use case, we recommend loading your model on multiple machine types and measuring characteristics such as the latency, cost, concurrency, and throughput.

One way to do this is to run this notebook on multiple machine types and compare the results to find the one that works best for you.

Vertex AI reserves approximately 1 vCPU on each replica for running system processes. This means that running the notebook on a single core machine type would be comparable to using a 2-core machine type for serving predictions.

When considering prediction costs, remember that although larger machines cost more, they can lower overall cost because fewer replicas are required to serve the same workload. This is particularly evident for GPUs, which tend to cost more per hour, but can both provide lower latency and cost less overall.

Batch prediction

For more information, see Choose machine type and replica count.

Optional GPU accelerators

Some configurations, such as the A2 series and G2 series, have a fixed number of GPUs built-in.

Other configurations, such as the N1 series, let you optionally add GPUs to accelerate each prediction node.

To add optional GPU accelerators, you must account for several requirements:

  • You can only use GPUs when your Model resource is based on a TensorFlow SavedModel, or when you use a custom container that has been designed to take advantage of GPUs. You can't use GPUs for scikit-learn or XGBoost models.
  • The availability of each type of GPU varies depending on which region you use for your model. Learn which types of GPUs are available in which regions.
  • You can only use one type of GPU for your DeployedModel resource or BatchPredictionJob, and there are limitations on the number of GPUs you can add depending on which machine type you are using. The following table describes these limitations.

The following table shows the optional GPUs that are available for online prediction and how many of each type of GPU you can use with each Compute Engine machine type:

Valid numbers of GPUs for each machine type
Machine type NVIDIA Tesla K80 NVIDIA Tesla P100 NVIDIA Tesla V100 NVIDIA Tesla P4 NVIDIA Tesla T4
n1-standard-2 1, 2, 4, 8 1, 2, 4 1, 2, 4, 8 1, 2, 4 1, 2, 4
n1-standard-4 1, 2, 4, 8 1, 2, 4 1, 2, 4, 8 1, 2, 4 1, 2, 4
n1-standard-8 1, 2, 4, 8 1, 2, 4 1, 2, 4, 8 1, 2, 4 1, 2, 4
n1-standard-16 2, 4, 8 1, 2, 4 2, 4, 8 1, 2, 4 1, 2, 4
n1-standard-32 4, 8 2, 4 4, 8 2, 4 2, 4
n1-highmem-2 1, 2, 4, 8 1, 2, 4 1, 2, 4, 8 1, 2, 4 1, 2, 4
n1-highmem-4 1, 2, 4, 8 1, 2, 4 1, 2, 4, 8 1, 2, 4 1, 2, 4
n1-highmem-8 1, 2, 4, 8 1, 2, 4 1, 2, 4, 8 1, 2, 4 1, 2, 4
n1-highmem-16 2, 4, 8 1, 2, 4 2, 4, 8 1, 2, 4 1, 2, 4
n1-highmem-32 4, 8 2, 4 4, 8 2, 4 2, 4
n1-highcpu-2 1, 2, 4, 8 1, 2, 4 1, 2, 4, 8 1, 2, 4 1, 2, 4
n1-highcpu-4 1, 2, 4, 8 1, 2, 4 1, 2, 4, 8 1, 2, 4 1, 2, 4
n1-highcpu-8 1, 2, 4, 8 1, 2, 4 1, 2, 4, 8 1, 2, 4 1, 2, 4
n1-highcpu-16 2, 4, 8 1, 2, 4 2, 4, 8 1, 2, 4 1, 2, 4
n1-highcpu-32 4, 8 2, 4 4, 8 2, 4 2, 4

Optional GPUs incur additional costs.

What's next