Google Cloud Architecture Framework: Performance optimization

This document in the Google Cloud Architecture Framework describes best practices to optimize the performance of your workloads in Google Cloud.


Evaluate performance requirements. Determine the priority of your various applications and what minimum performance you require of them.

Use scalable design patterns. Improve scalability and performance with autoscaling, compute choices, and storage configurations.

Best practices

  • Use autoscaling and data processing.
  • Use GPUs and TPUs to increase performance.
  • Identify apps to tune.

Use autoscaling and data processing

Use autoscaling so that as load increases or decreases, the services add or release resources to match.

Compute Engine autoscaling

Managed instance groups (MIGs) let you scale your stateless apps on multiple identical VMs, so that a group of Compute Engine resources is launched based on an instance template. You can configure an autoscaling policy to scale your group based on CPU utilization, load-balancing capacity, Cloud Monitoring metrics, schedules, and, for zonal MIGs, by a queue-based workload, like Pub/Sub.

Google Kubernetes Engine autoscaling

You can use the cluster autoscaler feature in Google Kubernetes Engine (GKE) to manage your cluster's node pool based on varying demand of your workloads. Cluster autoscaler increases or decreases the size of the node pool automatically, based on the resource requests (rather than actual resource utilization) of Pods running on that node pool's nodes.

Serverless autoscaling

Serverless compute options include Cloud Run, App Engine, and Cloud Functions, each of which provides autoscaling capabilities. Use these serverless options to scale your microservices or functions.

Data processing

Dataproc and Dataflow offer autoscaling options to scale your data pipelines and data processing. Use these options to allow your pipelines to access more computing resources based on the processing load.

Design questions

  • Which of your applications have variable user load or processing requirements?
  • Which of your data processing pipelines have variable data requirements?


  • Use Google Cloud Load Balancers to provide a global endpoint.
  • Use managed instance groups with Compute Engine to automatically scale.
  • Use the cluster autoscaler in GKE to automatically scale the cluster.
  • Use App Engine to autoscale your Platform-as-a-Service (PaaS) application.
  • Use Cloud Run or Cloud Functions to autoscale your function or microservice.

Key services


Use GPUs and TPUs to increase performance

Google Cloud provides options to accelerate the performance of your workloads. You can use these specialized hardware platforms to increase your application and data processing performance.

Graphics Processing Unit (GPU)

Compute Engine provides GPUs that you can add to your virtual machine instances. You can use these GPUs to accelerate specific workloads on your instances such as machine learning and data processing.

Tensor Processing Unit (TPU)

A TPU is specifically designed as a matrix processor by Google for machine learning workloads. TPUs are best suited for massive matrix operations with a large pipeline, with significantly less memory access.


Identify apps to tune

Application Performance Management (APM) includes tools to help you reduce latency and cost, so that you can run more efficient applications. With Cloud Trace, Cloud Debugger, and Cloud Profiler, you gain insight into how your code and services function, and you can troubleshoot if needed.


Latency plays a big role in determining your users' experience. When your application backend starts getting complex or you start adopting microservice architecture, it's challenging to identify latencies between inter-service communication or identify bottlenecks. Cloud Trace and OpenTelemetry tools help you scale collecting latency data from deployments and quickly analyze it.


Cloud Debugger helps you inspect and analyze your production code behavior in real time without affecting its performance or slowing it down.


Poorly performing code increases the latency and cost of applications and web services. Cloud Profiler helps you identify and address performance by continuously analyzing the performance of CPU or memory-intensive functions executed across an application.


  • Use Cloud Trace to instrument your applications.
  • Use Cloud Debugger to provide real-time production debugging capabilities.
  • Use Cloud Profiler to analyze the operating performance of your applications.

What's next

Explore the other categories of the Google Cloud Architecture Framework.