This document in the Google Cloud Architecture Framework provides recommendations to help you optimize the performance of your Compute Engine, Google Kubernetes Engine (GKE), and serverless resources.
This section provides guidance to help you optimize the performance of your Compute Engine resources.
Managed instance groups (MIGs) let you scale your stateless apps deployed on Compute Engine VMs efficiently. Autoscaling helps your apps continue to deliver predictable performance when the load increases. In a MIG, a group of Compute Engine VMs is launched based on a template that you define. In the template, you configure an autoscaling policy, which specifies one or more signals that the autoscaler uses to scale the group. The autoscaling signals can be schedule-based, like start time or duration, or based on target metrics such as average CPU utilization. For more information, see Autoscaling groups of instances.
Each virtual CPU (vCPU) that you allocate to a Compute Engine VM is implemented as a single hardware multithread. By default, two vCPUs share a physical CPU core. This architecture is called simultaneous multi-threading (SMT).
For workloads that are highly parallel or that perform floating point calculations (such as transcoding, Monte Carlo simulations, genetic sequence analysis, and financial risk modeling), you can improve performance by disabling SMT. For more information, see Set the number of threads per core.
For workloads such as machine learning and visualization, you can add graphics processing units (GPUs) to your VMs. Compute Engine provides NVIDIA GPUs in passthrough mode so that your VMs have direct control over the GPUs and the associated memory. For graphics-intensive workloads such as 3D visualization, you can use NVIDIA RTX virtual workstations. After you deploy the workloads, monitor the GPU usage and review the options for optimizing GPU performance.
Use compute-optimized machine types
Workloads like gaming, media transcoding, and high performance computing (HPC) require consistently high performance per CPU core. Google recommends that you use compute-optimized machine types for the VMs that run such workloads. Compute-optimized VMs are built on an architecture that uses features like non-uniform memory access (NUMA) for optimal and reliable performance.
Tightly coupled HPC workloads have a unique set of requirements for achieving peak efficiency in performance. For more information, see the following documentation:
- Best practices for running tightly coupled HPC applications on Compute Engine
Choose appropriate storage
Google Cloud offers a wide range of storage options for Compute Engine VMs: Persistent disks, local solid-state drive (SSD) disks, Filestore, and Cloud Storage. For design recommendations and best practices to optimize the performance of each of these storage options, see Optimize storage performance.
Google Kubernetes Engine
This section provides guidance to help you optimize the performance of your Google Kubernetes Engine (GKE) resources.
You can automatically resize the node pools in a GKE cluster to match the current load by using the cluster autoscaler feature. Autoscaling helps your apps continue to deliver predictable performance when the load increases. The cluster autoscaler resizes node pools automatically based on the resource requests (rather than actual resource utilization) of the Pods running on the nodes. When you use autoscaling, there can be a trade-off between performance and cost. Review the best practices for configuring cluster autoscaling efficiently.
Use C2D VMs
You can improve the performance of compute-intensive containerized workloads by using C2D machine types. You can add C2D nodes to your GKE clusters by choosing a C2D machine type in your node pools.
Simultaneous multi-threading (SMT) can increase application throughput significantly for general computing tasks and for workloads that need high I/O. But for workloads in which both the virtual cores are compute-bound, SMT can cause inconsistent performance. To get better and more predictable performance, you can disable SMT for your GKE nodes by setting the number of vCPUs per core to 1.
For compute-intensive workloads like image recognition and video transcoding, you can accelerate performance by creating node pools that use GPUs. For more information, see Running GPUs.
Use container-native load balancing
Container-native load balancing enables load balancers to distribute traffic directly and evenly to Pods. This approach provides better network performance and improved visibility into network latency between the load balancer and the Pods. Because of these benefits, container-native load balancing is the recommended solution for load balancing through Ingress.
Define a compact placement policy
Tightly coupled batch workloads need low network latency between the nodes in the GKE node pool. You can deploy such workloads to single-zone node pools, and ensure that the nodes are physically close to each other by defining a compact placement policy. For more information, see Define compact placement for GKE nodes.
Serverless compute services
This section provides guidance to help you optimize the performance of your serverless compute services in Google Cloud: Cloud Run and Cloud Functions. These services provide autoscaling capabilities, where the underlying infrastructure handles scaling automatically. By using these serverless services, you can reduce the effort to scale your microservices and functions, and focus on optimizing performance at the application level.
For more information, see the following documentation:
- Optimizing performance for Cloud Run services
- Optimizing Java applications for Cloud Run
- Optimizing performance in Cloud Functions
Review the best practices for optimizing the performance of your storage, networking, database, and analytics resources:
- Optimize storage performance.
- Optimize networking performance.
- Optimize database performance.
- Optimize analytics performance.