This page provides some general background on profiling, on the types of profiling available with Stackdriver Profiler, and information about them.
What is profiling?
Profiling is a form of dynamic code analysis. That is, profiling lets you capture characteristics of the code as it runs (that's the dynamic aspect). It lets you look at the actual resource consumption or performance traits of the program.
Profiling your code during development and testing can help you optimize the design of the code and find bugs, reducing the risk of catastrophic failures in production.
Profiling production code can help you anticipate when future problems might arise and help diagnose problems that do occur.
Unlike static code analysis, which examines the source code of the application rather than the running application, profiling puts an additional load on the program as it collects statistics about the running code. In addition, profiling needs a way to collect and retrieve these statistics from the execution environment. This additional work adds load to the program.
There are many ways to collect profiling data, and many ways to try to minimize the additional load on the running application. These typically involve tradeoffs between the accuracy of the profiled characteristics and the drag on the running application.
Stackdriver Profiler is a statistical, or sampling, profiler. It does not require pervasive changes to the program code to collect data. Instead, a piece of code, called the profiling agent, is essentially attached to the code, where it can periodically look at the call stack of the program to collect information about, for example, CPU usage or memory usage.
Sampling profilers are typically less accurate and precise, because they sample the profiled traits, but they have minimal impact on the performance of the profiled application, particularly important in continuous profiling of production code. Accuracy improves as the number of samples improves, though it is a statistical approximation.
After collecting profiler data, you analyze it using the Profiler interface.
Types of profiling available
Stackdriver Profiler supports different types of profiling based on the language in which a program is written. The following table summarizes the supported profile types by language:
1 For App Engine standard environment, Go 1.11 or later is required. 2 Only available for App Engine standard environment. 3 Not available for App Engine standard environment. 4 Only available for Python 3.2 and higher. 5 Only available for Python 3.6 and higher.
CPU time is the time the CPU spends executing a block of code.
Wall-clock time (also called wall time) is the time it takes to run a block of code.
The CPU time for a function tells you how long it took to execute the code in the function. This measures the time the CPU was busy processing instructions. It doesn't include the time the CPU was waiting (or processing instructions for something else).
Wall-clock time for a function measures the time elapsed between entering and exiting a function. This includes time waiting for locks for database access, waiting for thread synchronization, waiting for locks, and so forth. The wall time for a block of code can never be less than the CPU time.
A block of code can take a long time to run but actually require little CPU time. If the wall time is much greater than the CPU time, the code spends a lot of time waiting for other things to happen. A block of code that spends a vast amount of its time waiting for other things to happen might indicate a resource bottleneck, where too many requestors are trying to access some limited resource.
If the CPU time is close to the wall time, the block of code is CPU intensive; almost all the time it takes to run is spent by the CPU. Long-running CPU-intensive blocks of code might be candidates for optimization: is there a more CPU-efficient way to do the work, something that involves fewer or faster operations?
In Stackdriver Profiler, profiling CPU time is supported for:
In Stackdriver Profiler, profiling wall-clock time is supported for:
Heap (memory) consumption
Heap consumption (also called heap) is the memory currently allocated in the program's heap.
Heap allocation (also called heap alloc) is all memory that was allocated in the program's heap, including memory that is no longer in use.
As programs run, they consume memory. They create objects; those objects take up space. They call functions; those functions take up space.
A well-behaved program uses memory efficiently and judiciously. It uses only the memory it needs; that is, it doesn't have an overly large memory footprint. It also returns that memory when it no longer needs it; that is, it doesn't leak memory.
A program that consumes more memory than it truly needs or that holds onto memory it no longer needs might start slowly, might gradually slow down or even crash, and might even affect resources available to other applications. A program which allocates memory more frequently than it truly needs, in a garbage collected language, creates more work for the garbage collector.
Profiling heap consumption helps you find potential inefficiencies and memory leaks in your programs. Profiling heap allocations helps you know which allocations are causing the most work for the garbage collector.
In Stackdriver Profiler, profiling heap consumption is supported for:
In Stackdriver Profiler, profiling heap allocation is supported for:
Applications that create threads can suffer from blocked threads, threads that are created but never actually get to run, and from thread leaks, where the number of threads created keeps increasing. The first problem is one cause of the second.
In Stackdriver Profiler, profiling thread usage is supported for Go. This profile captures information on goroutines, the Go concurrency mechanism, not operating-system threads.
In a multi-threaded program, the time spent waiting to serialize access to a shared resource can be significant. Understanding contention behavior can guide the design of the code and provide information for performance tuning.
In Stackdriver Profiler, profiling of mutex contention is supported for Go. This lets you determine the amount of time spent waiting for mutexes, as well as the frequency with which contention occurs.
For all profile types except heap consumption and thread consumption, a single profile represents data collected for 10 seconds from an instance of the configured service in a single Compute Engine zone. A profile is collected, on average, every minute.
This means that, if you have 10 instances of a service running, within a 10-minute period, each instance is sampled approximately one time. There is some randomization involved, so in a 10-minute period, you might see, or might not see, exactly 10 profiles, one from each service.
For heap consumption and thread profiles, each profile is collected instantaneously rather than over a 10-second period, approximately one time per minute. Unlike heap consumption profiles, heap allocation profiles are collected over a 10-second period.