HPC-optimized machine family for Compute Engine


HPC-optimized instances are ideal for compute-intensive and high performance computing (HPC) workloads. HPC-optimized instances offer the highest performance per core and are built on architecture that utilizes features like non-uniform memory access (NUMA) for optimal reliable uniform performance.

Machine Workloads
H4D machine series (Preview)
  • HPC workloads and multi-node workloads
  • Manufacturing
  • Weather forecasting
  • Electronic design automation (EDA)
  • Healthcare and life sciences
  • Scientific Computing
H3 machine series
  • HPC workloads
  • Computational fluid dynamics
  • Crash safety
  • Genomics
  • Financial modeling
  • General scientific and engineering computing

The following machine series are available in this machine family:

  • H4D instances (Preview) are powered by AMD EPYC Turin processors which have a base frequency of 2.7 GHz and a maximum frequency of 4.1 GHz. H4D instances have 192 cores (vCPUs) and up to 1,488 GB of memory. H4D instances can be used with Local SSD storage and Cloud RDMA networking.
  • H3 instances are powered by two 4th-generation Intel Xeon Scalable processors (code-named Sapphire Rapids) which have an all-core frequency of 3.0 GHz. H3 instances have 88 vCPUs and 352 GB of DDR5 memory.

H4D machine series

H4D instances are powered by the AMD EPYC Turin 5th Generation processors and Titanium offload processors.

H4D instances deliver high performance, low cost, and scalability for multi-node workloads. H4D instances are single-threaded and are optimized for tightly-coupled applications that scale across multiple nodes. Leveraging technologies like RDMA-enabled 200 Gbps networking and Cluster Director, these instances prioritize performance and workload-specific optimizations. Additionally, you can use Dynamic Workload Scheduler for scheduled or immediate cluster deployment, making H4D ideal for HPC bursty workload needs.

An H4D instance uses all the vCPUs on an entire host server. H4D instances can use the entire host network bandwidth and come with a default network bandwidth rate of up to 200 Gbps. However, the bandwidth from the instance to the internet is limited to 1 Gbps.

Simultaneous multithreading (SMT) is disabled for H4D instances and can't be enabled. There is also no overcommitting to ensure optimal performance consistency.

H4D instances are available on-demand, or with one- and three-year committed use discounts (CUDs). To compare these methods, see Compute Engine instances provisioning models.

H4D Limitations

The H4D machine series has the following restrictions:

  • The H4D machine types are only available in a predefined machine type. Custom machine types aren't available.
  • You can't use GPUs with H4D instances.
  • Outbound data transfer is limited to 1 Gbps.
  • You can't create machine images from H4D instances.
  • H4D machine images can't be used to create disks.
  • You can't share disks between instances, either in multi-writer mode or read-only mode.
  • Hyperdisk Balanced performance is capped at 15,000 IOPS and 240 MBps throughput.
  • Live migration isn't supported for H4D instances.

H4D machine types

Machine types vCPUs1 Memory (GB) Titanium SSD Default egress bandwidth (Gbps)2 NUMA nodes
h4d-highmem-192-lssd 192 1,488 (10 x 375 GiB)
3,750 GiB
Up to 200 Gbps 2

1 A vCPU represents an entire core—no simultaneous multithreading (SMT).
2 Default egress bandwidth cannot exceed the number given. Actual egress bandwidth depends on the destination IP address and other factors. See Network bandwidth.

Supported disk types for H4D

H4D instances can use the following block storage types:

  • Hyperdisk Balanced (hyperdisk-balanced)
  • Local Titanium SSD

Disk and capacity limits

The following restrictions apply:

  • The number of Hyperdisk volumes can't exceed 64 per VM.
  • The maximum total disk capacity across all disks can't exceed 512 TiB.

For details about the capacity limits, see Hyperdisk capacity limits per VM.

H4D storage limits are described in the following table:

Maximum number of disks per instance
Machine types All Hyperdisk
types
Hyperdisk Balanced Hyperdisk Throughput Hyperdisk Extreme
h4d-highmem-192-lssd 64 8 0 0

Network support for H4D instances

H4D instances require gVNIC network interfaces. H4D supports up to 200 Gbps network bandwidth for standard networking. Instance to Internet egress bandwidth is limited to 1 Gbps.

RDMA capable instances require at least two network interfaces (vNICs). One vNIC is used for normal networking and is fully connected to the Google network and optionally the Internet. This vNIC uses the gVNIC driver. The other vNIC uses an Intel iDPF/iRDMA driver and is used for RDMA communication. The RDMA vNIC doesn't connect to the Internet.

Before migrating to H4D or creating H4D instances, make sure that the operating system image that you use is fully supported for H4D. Fully supported images include support for 200 Gbps network bandwidth. If you are using Cloud RDMA, then the OS image must also support the IRDMA network interface type. If your H4D instance is using an operating system that is not fully supported or has earlier versions of the network drivers, then your instance might not be able to achieve the maximum network bandwidth for H4D instances.

Maintenance experience for H4D instances

During the lifecycle of a Compute Engine instance, the host machine that your instance runs on undergoes multiple host events. A host event can include the regular maintenance of Compute Engine infrastructure, or in rare cases, a host error. Compute Engine also applies some non-disruptive lightweight upgrades for the hypervisor and network in the background.

The H4D machine series offers the following features related to host maintenance:

Machine type Typical scheduled maintenance event frequency Maintenance behavior Advanced notification On-demand maintenance Simulate maintenance
h4d-highmem-192-lssd Minimum of 30 days Terminates with Local SSD data persistence 7 days Yes Yes

The maintenance frequencies shown in the previous table are approximations, not guarantees. Compute Engine might occasionally perform maintenance more frequently.

H3 machine series

H3 instances are powered by the 4th generation Intel Xeon Scalable processors (code-named Sapphire Rapids), DDR5 memory, and Titanium offload processors.

H3 instances offer the best price performance for compute-intensive high performance computing (HPC) workloads in Compute Engine. H3 instances are single-threaded and are ideal for a variety of modeling and simulation workloads including computational fluid dynamics, crash safety, genomics, financial modeling, and general scientific and engineering computing. H3 instances support compact placement, which is optimized for tightly-coupled applications that scale across multiple nodes.

The H3 series is available in one size, comprising an entire host server. To save on licensing costs, you can customize the number of visible cores, but you are charged the same price for the instance. H3 instances can use the entire host network bandwidth and come with a default network bandwidth rate of up to 200 Gbps. However, the bandwidth from the instance to the internet is limited to 1 Gbps.

Simultaneous multithreading (SMT) is disabled for H3 instances and can't be enabled. There is also no overcommitting to ensure optimal performance consistency.

H3 instances are available on-demand, or with one- and three-year committed use discounts (CUDs). H3 instances can be used with Google Kubernetes Engine.

H3 Limitations

The H3 machine series has the following restrictions:

  • The H3 machine series is only available in a predefined machine type. Custom machine shapes aren't available.
  • You can't use GPUs with H3 instances.
  • Outbound data transfer is limited to 1 Gbps.
  • Persistent Disk and Google Cloud Hyperdisk performance is capped at 15,000 IOPS and 240 MBps throughput.
  • H3 instances don't support machine images.
  • H3 instances support only the NVMe storage interface.
  • H3 instance images can't be used to create disks.
  • H3 instances don't support sharing disks between instances, either in multi-writer mode or read-only mode.

H3 machine types

H3 instances are available as a predefined configuration with 88 vCPUs and 352 GB of memory.

Machine types vCPUs1 Memory (GB) Local SSD Default egress
bandwidth (Gbps)2
h3-standard-88 88 352 Not supported Up to 200 Gbps

1 A vCPU represents an entire core—no simultaneous multithreading (SMT).
2 Default egress bandwidth cannot exceed the number given. Actual egress bandwidth depends on the destination IP address and other factors. See Network bandwidth.

Supported disk types for H3

H3 instances can use the following block storage types:

  • Balanced Persistent Disk (pd-balanced)
  • Hyperdisk Balanced (hyperdisk-balanced)
  • Hyperdisk Throughput (hyperdisk-throughput)

Disk and capacity limits

If supported by the machine type, you can attach a mixture of Hyperdisk and Persistent Disk volumes to an instance, but the following restrictions apply:

  • The combined number of both Hyperdisk and Persistent Disk volumes can't exceed 128 per instance.
  • The maximum total disk capacity (in TiB) across all disk types can't exceed:

    • 512 TiB for all Hyperdisk
    • 512 TiB for a mixture of Hyperdisk and Persistent Disk
    • 257 TiB for all Persistent Disk

For details about the capacity limits, see Hyperdisk size and attachment limits and Persistent Disk maximum capacity.

H3 storage limits are described in the following table:

Maximum number of disks per instance
Machine types All disk types 1 All Hyperdisk
types
Hyperdisk Balanced Hyperdisk Throughput Hyperdisk Extreme
h3-standard-88 128 64 8 64 0

1 This limit applies to Persistent Disk and Hyperdisk, but doesn't include Local SSD disks.

Network support for H3 instances

H3 instances require gVNIC network interfaces. H3 supports up to 200 Gbps network bandwidth for standard networking.

Before migrating to H3 or creating H3 instances, make sure that the operating system image that you use supports the gVNIC driver. To get the best possible performance on H3 instances, on the Networking features tab of the OS details table, choose an OS image that supports both "Tier_1 Networking" and "200 Gbps network bandwidth". These images include an updated gVNIC driver, even if the guest OS shows the gve driver version as 1.0.0. If your H3 instance is using an operating system with an older version of the gVNIC driver, this is still supported but the instance might experience suboptimal performance such as less network bandwidth or higher latency.

If you use a custom OS image with the H3 machine series, you can manually install the most recent gVNIC driver. The gVNIC driver version v1.4.2 or later is recommended for use with H3 instances. Google recommends using the latest gVNIC driver version to benefit from additional features and bug fixes.

Maintenance experience for H3 instances

During the lifecycle of a Compute Engine instance, the host machine that your instance runs on undergoes multiple host events. A host event can include the regular maintenance of Compute Engine infrastructure, or in rare cases, a host error. Compute Engine also applies some non-disruptive lightweight upgrades for the hypervisor and network in the background.

The H3 machine series offers the following features related to host maintenance:

Machine type Typical scheduled maintenance event frequency Maintenance behavior Advanced notification On-demand maintenance Simulate maintenance
h3-standard-88 Minimum of 30 days Live migrate 7 days Yes Yes

The maintenance frequencies shown in the previous table are approximations, not guarantees. Compute Engine might occasionally perform maintenance more frequently.

What's next