Understanding the performance of production systems is notoriously difficult. Attempting to measure performance in test environments usually fails to replicate the pressures on a production system. Microbenchmarking parts of the code is sometimes feasible, but it also typically fails to replicate the workload and behavior of a production system.
Continuous profiling of production systems is an effective way to discover where resources like CPU cycles and memory are consumed as a service operates in its working environment. But profiling adds an additional load on the production system: in order to be an acceptable way to discover patterns of resource consumption, the additional load of profiling must be small.
Stackdriver Profiler is a statistical, low-overhead profiler that continuously gathers CPU usage and memory-allocation information from your production applications. It attributes that information to the source code that generated it, helping you identify the parts of the code that are consuming the most resources, and otherwise illuminating the performance characteristics of the code.
Environment and languages
Stackdriver Profiler runs on Linux in the following Google Cloud Platform environments:
- Compute Engine
- Google Kubernetes Engine (GKE)
- App Engine flexible environment
- App Engine standard environment
Stackdriver Profiler can profile code written in the following languages:
See Profiling agent for more information on running the profiling agent.
Profiler supports different types of profiling based on the language in which a program is written. The following table summarizes the supported profile types by language:
1 For App Engine standard environment, Go 1.11 or later is required. 2 Only available for App Engine standard environment. 3 Not available for App Engine standard environment. 4 Only available for Python 3.2 and higher. 5 Only available for Python 3.6 and higher.
For more information about these profile types, see Profiling concepts.
Stackdriver Profiler creates a single profile by collecting profiling data, usually for 10 seconds, every 1 minute for a single instance of the configured service in a single Compute Engine zone. If, for example, your GKE service runs 10 replicas of a pod, then, in a 10-minute period, roughly 10 profiles are created, and each pod is profiled approximately once. The profiling period is randomized, so there is variation. See Profile collection for more information.
The overhead of the CPU and heap allocation profiling at the time of the data collection is less than 5 percent. Amortized over the execution time and across multiple replicas of a service, the overhead is commonly less than 0.5 percent, making it an affordable option for always-on profiling in production systems.
Stackdriver Profiler consists of the profiling agent, which collects the data, and a console interface on GCP, which lets you view and analyze the data collected by the agent.
You install the agent on the virtual machines where your application runs. The agent typically comes as a library that you attach to your code when you run it. The agent collects profiling data as the app runs.
For information on running the Stackdriver Profiler agent, see:
You can also run the profiling agent on non-Google Cloud Platform systems. See Profiling outside Google Cloud Platform for more information.
After the agent has collected some profiling data, you can use the Profiler interface to see how the statistics for CPU and memory usage correlate with areas of the code.
The profile data is retained for 30 days, so you can analyze performance data for periods up to the last 30 days. Profiles can be downloaded for long-term storage.
For information on using the Profiler interface, see Using the Profiler interface.