Optimize Python applications for Cloud Run

Stay organized with collections Save and categorize content based on your preferences.

This guide describes optimizations for Cloud Run services written in the Python programming language, along with background information to help you understand the tradeoffs involved in some of the optimizations. The information on this page supplements the general optimization tips, which also apply to Python.

Many of the best practices and optimizations in these traditional Python web-based application revolve around:

  • Handling concurrent requests (both thread-based and non-blocking I/O)
  • Reducing response latency using connection pooling and batching non-critical functions, for example sending traces and metrics to background tasks.

Optimize the container image

By optimizing the container image, you can reduce load and startup times. You can optimize the image by:

  • Only putting into your container what your app needs at runtime
  • Optimizing the WSGI server

Only put into your container what your app needs at runtime

Consider which components are included in the container, and whether they are required for the execution of the service. There are multiple ways to minimize the container image:

  • Use a smaller base image
  • Move large files outside of the container

Use a smaller base image

Docker Hub provides a number of official Python base images that you can use, if you choose not to install Python from source within your containers. These are based on the Debian operating system.

If you are using Docker Hub's python image, consider using the slim version. These images are smaller in size because they do not come with a number of packages that would be used to build wheels, for example, which you may not need to do for your application. For example, the python image comes with the GNU C compiler, preprocessor and core utilities.

To identify the ten largest packages in a base image, you can run the following command:

DOCKER_IMAGE=python # or python:slim
docker run --rm ${DOCKER_IMAGE} dpkg-query -Wf '${Installed-Size}\t${Package}\t${Description}\n' | sort -n | tail -n10 | column -t -s $'\t'

Because there are fewer of these low level packages, the slim based images also offer less attack surface for potential vulnerabilities. Note that these images may not include the elements required to build wheels from source.

You can add specific packages back in by adding a RUN apt install line to your Dockerfile. See more about using System Packages in Cloud Run.

There are also options for non-Debian based containers. The python:alpine option may result in a much smaller container, but many Python packages may not have pre-compiled wheels that support alpine-based systems. Support is improving (see PEP-656), but continues to be varied. You can also consider using the distroless base image, which does not contain any package managers, shells or any other programs.

Move large files outside of the container

Large files, like media assets, etc, do not need to be included in the base container.

Google Cloud offers multiple hosting options, like Cloud Storage, to store these large items. Move large assets to these services, then reference them from your application at run time.

Optimize the WSGI server

Python has standardized the way that applications can interact with web servers by the implementation of the WSGI standard, PEP-3333. One of the more common WSGI servers is gunicorn, which is used in much of the sample documentation.

Optimize gunicorn

The CMD part of the Dockerfile shows an optimized invocation of gunicorn:

# Use the official lightweight Python image.
# https://hub.docker.com/_/python
FROM python:3.10-slim

# Allow statements and log messages to immediately appear in the Knative logs

# Copy local code to the container image.
COPY . ./

# Install production dependencies.
RUN pip install --no-cache-dir -r requirements.txt

# Run the web service on container startup. Here we use the gunicorn
# webserver, with one worker process and 8 threads.
# For environments with multiple CPU cores, increase the number of workers
# to be equal to the cores available.
# Timeout is set to 0 to disable the timeouts of the workers to allow Cloud Run to handle instance scaling.
CMD exec gunicorn --bind :$PORT --workers 1 --threads 8 --timeout 0 main:app

If you are considering changing these settings, adjust the number of workers and threads on a per-application basis. For example, try to use a number of workers equal to the cores available and make sure there is a performance improvement, then adjust the number of threads. Setting too many workers or threads can have a negative impact, such as longer cold start latency, more consumed memory, smaller requests per second, etc.

Adding the --preload setting can help to:

  • Identify serious runtime bugs at deploy time
  • Save memory resources

You should consider what your application is preloading before adding this.

Other WSGI servers

You are not restricted to using gunicorn for running Python in containers. You can use any WSGI or ASGI web server, as long as the container listens on HTTP port $PORT, as per the Container runtime contract.

Common alternatives include uwsgi, uvicorn, and waitress.

For example, given file named main.py containing the app object, the following invocations would start a WSGI server:

# uwsgi: pip install pyuwsgi
uwsgi --http :$PORT -s /tmp/app.sock --manage-script-name --mount /app=main:app

# uvicorn: pip install uvicorn
uvicorn --port $PORT --host main:app

# waitress: pip install waitress
waitress-serve --port $PORT main:app

These can either be added as a CMD exec line in a Dockerfile, or as a web: entry in Procfile when using Google Cloud Buildpacks.

Optimize applications

In your Cloud Run service code, you can also optimize for faster startup times and memory usage.

Reduce threads

You can optimize memory by reducing the number of threads, by using non-blocking reactive strategies and avoiding background activities. Also avoid writing to the file system, as mentioned in the general tips page.

If you want to support background activities in your Cloud Run service, set your Cloud Run service CPU to be always allocated so you can run background activities outside of requests and still have CPU access.

Reduce startup tasks

Python web-based applications can have many tasks to complete during startup, e.g., preloading of data, warming up the cache, establishing connection pools, etc. These tasks, when executed sequentially, can be slow. However, if you want them to execute in parallel, you should increase the number of CPU cores.

Cloud Run currently sends a real user request to trigger a cold start instance. Users who have a request assigned to a newly started instance may experience long delays. Cloud Run currently does not have a "readiness" check to avoid sending requests to unready applications.

What's next

For more tips, see