Jump to Content
Containers & Kubernetes

Faster startup times for Kubernetes workloads with Kube Startup CPU Boost

February 7, 2024
Mikołaj Stefaniak

Strategic Cloud Engineer, Google Cloud

Abdelfettah Sghiouar

Senior Cloud Developer Advocate, Google Cloud

Try Gemini 1.5 models

Google's most advanced multimodal models in Vertex AI

Try it

Despite Kubernetes’ many automation features, running containerized applications comes with some challenges. One of them is the need to define resources required by the application. Those are typically CPU and memory but may also include local storage. Kubernetes provides a way of configuring resources for an application in the Pod templates.

Loading...

But what if the application's resource needs vary over the time? One option is to configure larger requests to cover peak resource needs. This is not an optimal approach resulting in resource underutilization. It also generates unnecessary costs of the infrastructure that is not used all the time.

Java Virtual Machine resource usage patterns

Java applications typically need different resources over time. Java is a dynamic, interpreted language based on the principle of “write once run anywhere.” It achieves this by producing universal byte-code instead of architecture-specific machine code, requiring the Java Virtual Machine (JVM) to run applications. The JVM usually needs more resources during startup and much less once running. This is due to intensive compute operations at the time of initial class loading or optimization. As the JVM takes advantage of multi-threading, allocating more CPU resources usually reduces startup times.

Containerizing Java applications

Containers have become the de facto way to deploy and run applications in the cloud. Container platforms provide portability by design, therefore JVM portability is not useful when running in a container. Companies moving to the cloud and running containers often look for elasticity for their workloads. The ability to dynamically scale up and down when needed also means paying less for the resources used. The long startup times of containerized JVM applications makes it problematic to leverage the elasticity features of container runtimes in the cloud.

One possible solution is compiling Java code to native machine code ahead of time. This allows Java applications to run without the JVM and provides faster startup and better performance. GraalVM, for example, is a Java Development Toolkit that supports this way of building the code. Using this approach comes with other challenges though, often requiring application modernization efforts. So companies would prefer to use the JVM if the container platform can dynamically allocate compute resources as they are needed.

Dynamic resource scaling in Kubernetes and CPU Boost

Kubernetes version 1.27 introduced a new feature called in-place resource resize, which allows you to resize Pod resources without the need to restart the containers. To enable this, the resources field in Pod's container now allows mutation for CPU and memory resources. This feature is still alpha.

A solution that benefits from in-place resource resize is Kube Startup CPU Boost, a Kubernetes operator that increases CPU resources for Pods. The resource update happens before the cluster schedules the Pod on a node. Once the containers are ready, the operator updates their resources to the original values. Thanks to the in-place resource resize feature, this operation does not force the Pod restart.

Kube Startup CPU Boost is open source. It aims at solving the use case of applications that need extra resources during startup. The use cases are not limited to containerized JVM applications.

Installing Kube Startup CPU Boost can be done with the below command. As a prerequisite, the cluster needs to have the InPlacePodVerticalScaling feature gate enabled.

Loading...

Once installed, you can configure CPU boosts for your applications. Let's first deploy a sample Java application and check its startup time without the boost. For this purpose, you can use a demo application with the below characteristics:

  • Created with Spring Boot 3 framework
  • Exposes data on the REST endpoint
  • Fetches the data from the database using Spring Data and Java Persistence API
  • Runs in a container that uses Java 17 and executed as a "fat jar"
  • Uses CPU requests and limits of 1 core
Loading...

The below command will check the startup time of the Spring application from logs. In our case it was around 18 seconds on average on a GKE cluster with e2-standard4 nodes.

Loading...

Let's repeat the same but this time with a Startup CPU Boost configuration. Increase container CPU requests and limits by 100% (to 2 cores) until the Pod reaches Ready condition. To achieve this, apply the below configuration in your application's namespace.

Loading...

Next, remove the previous deployment and create it again.

Loading...

This time the Startup CPU Boost kicks in and increases container CPU resources. Using the same command as before to check the application's startup time, we saw in our testing 9 seconds startup time - about two times faster.

Loading...

What happened under the hood?

https://storage.googleapis.com/gweb-cloudblog-publish/images/01._kube-startup-cpu-boost.max-1000x1000.png

The Kube Startup CPU Boost uses a mutating admission webhook to increase container resources. The webhook receives admission requests for new Pods. It queries the Boost Manager component for matching boost configuration. Once found, it increases CPU resource requests and limits as configured. Once the Pod reaches desired status, the manager updates container resources to their original values - all without restarting pods, thanks to the Kuberentes in-place POD resize feature.

Caveats and limitations

Administrators should consider this solution when planning cluster capacity and choosing a node configuration. Without enough capacity, the cluster won't be able to schedule boosted Pods. Securing extra CPU resources on nodes for faster startup is a tradeoff between speed and cost. Thanks to the in-place resize feature, those resources will be available for other applications after a short time. This makes total overhead smaller when compared to running over-provisioned Pods.

Cluster autoscaler users should also take caution when using this solution. It is not recommended to use it with autoscalers that aggressively optimize utilization. As the boost manager decreases the Pod's initial resources, autoscaler may consider a node as underutilized. This in turn may trigger a scale-down action and rescheduling of a Pod to different nodes.

Summary

In the article we have described Kubernetes resource management for Pods and containers. The process can be suboptimal for applications that need different resources in time. One example of such applications are Java applications running in JVM in containers, which need more CPU resources during the startup phase and less once running. To guarantee that, you need to run Pods with inflated CPU resource requests to cover peak needs. But setting higher resource limits is not sufficient as their availability varies over time.

The new Kuberentes in-place POD resize feature aims to solve this problem, and the Kube Startup CPU Boost solution demonstrates how to leverage this new feature. It is a targeted solution for applications that need extra CPU resources during the start phase. The CPU resources are decreased once the application is up and running, and thanks to in-place resource resize, this operation does not restart the Pods.

Posted in