This guide describes optimizations for Cloud Run services written in the Python programming language, along with background information to help you understand the tradeoffs involved in some of the optimizations. The information on this page supplements the general optimization tips, which also apply to Python.
Many of the best practices and optimizations in these traditional Python web-based application revolve around:
- Handling concurrent requests (both thread-based and non-blocking I/O)
- Reducing response latency using connection pooling and batching non-critical functions, for example sending traces and metrics to background tasks.
Optimize the container image
By optimizing the container image, you can reduce load and startup times. You can optimize the image by:
- Only putting into your container what your app needs at runtime
- Optimizing the WSGI server
Only put into your container what your app needs at runtime
Consider which components are included in the container, and whether they are required for the execution of the service. There are multiple ways to minimize the container image:
- Use a smaller base image
- Move large files outside of the container
Use a smaller base image
Docker Hub provides a number of official Python base images that you can use, if you choose not to install Python from source within your containers. These are based on the Debian operating system.
If you are using Docker Hub's python
image, consider using the slim
version.
These images are smaller in size because they do not come with a number of
packages that would be used to build wheels, for example, which you may not need
to do for your application. For example, the python image comes with the GNU C
compiler, preprocessor and core utilities.
To identify the ten largest packages in a base image, you can run the following command:
DOCKER_IMAGE=python # or python:slim
docker run --rm ${DOCKER_IMAGE} dpkg-query -Wf '${Installed-Size}\t${Package}\t${Description}\n' | sort -n | tail -n10 | column -t -s $'\t'
Because there are fewer of these low level packages, the slim
based images also
offer less attack surface for potential vulnerabilities. Note that these images
may not include the elements required to build wheels from source.
You can add specific packages back in by adding a RUN apt install
line to your
Dockerfile. See more about using System Packages in Cloud Run.
There are also options for non-Debian based containers. The python:alpine
option may result in a much smaller container, but many Python packages may not
have pre-compiled wheels that support alpine-based systems. Support is improving
(see PEP-656), but continues to be
varied. You can also consider using the
distroless base image
,
which does not contain any package managers, shells or any other programs.
Move large files outside of the container
Large files, like media assets, etc, do not need to be included in the base container.
Google Cloud offers multiple hosting options, like Cloud Storage, to store these large items. Move large assets to these services, then reference them from your application at run time.
Optimize the WSGI server
Python has standardized the way that applications can interact with web servers
by the implementation of the WSGI standard,
PEP-3333. One of the more common
WSGI servers is gunicorn
, which is used in much of the sample documentation.
Optimize gunicorn
Add the following CMD
to the Dockerfile
to optimize the invocation of
gunicorn
:
CMD exec gunicorn --bind :$PORT --workers 1 --threads 8 --timeout 0 main:app
If you are considering changing these settings, adjust the number of workers and threads on a per-application basis. For example, try to use a number of workers equal to the cores available and make sure there is a performance improvement, then adjust the number of threads. Setting too many workers or threads can have a negative impact, such as longer cold start latency, more consumed memory, smaller requests per second, etc.
By default, gunicorn
spawns workers and listens on the specified port when
starting up, even before evaluating your application code. In this case, you
should set up custom startup probes
for your service, since the Cloud Run default startup probe immediately marks
a container instance as healthy as soon as it starts to listen on $PORT
.
If you want to change this behavior, you can invoke gunicorn
with the
--preload
setting
to evaluate your application code before listening. This can help to:
- Identify serious runtime bugs at deploy time
- Save memory resources
You should consider what your application is preloading before adding this.
Other WSGI servers
You are not restricted to using gunicorn
for running Python in containers.
You can use any WSGI or ASGI web server, as long as the container listens on
HTTP port $PORT
, as per the
Container runtime contract.
Common alternatives include uwsgi
,
uvicorn
,
and waitress
.
For example, given file named main.py
containing the app
object, the
following invocations would start a WSGI server:
# uwsgi: pip install pyuwsgi
uwsgi --http :$PORT -s /tmp/app.sock --manage-script-name --mount /app=main:app
# uvicorn: pip install uvicorn
uvicorn --port $PORT --host 0.0.0.0 main:app
# waitress: pip install waitress
waitress-serve --port $PORT main:app
These can either be added as a CMD exec
line in a Dockerfile
, or as a web:
entry in Procfile
when using
Google Cloud's buildpacks.
Optimize applications
In your Cloud Run service code, you can also optimize for faster startup times and memory usage.
Reduce threads
You can optimize memory by reducing the number of threads, by using non-blocking reactive strategies and avoiding background activities. Also avoid writing to the file system, as mentioned in the general tips page.
If you want to support background activities in your Cloud Run service, set your Cloud Run service CPU to be always allocated so you can run background activities outside of requests and still have CPU access.
Reduce startup tasks
Python web-based applications can have many tasks to complete during startup, e.g., preloading of data, warming up the cache, establishing connection pools, etc. These tasks, when executed sequentially, can be slow. However, if you want them to execute in parallel, you should increase the number of CPU cores.
Cloud Run currently sends a real user request to trigger a cold start instance. Users who have a request assigned to a newly started instance may experience long delays. Cloud Run currently does not have a "readiness" check to avoid sending requests to unready applications.
What's next
For more tips, see