This page provides guidance and best practices for running WebSockets or other streaming services on Cloud Run and writing clients for such services.
WebSockets applications are supported on Cloud Run with no additional configuration required. However, WebSockets streams are HTTP requests still subject to the request timeout configured for your Cloud Run service, so you need to make sure this setting works well for your use of WebSockets such as implementing reconnects in your clients.
On Cloud Run, session affinity isn't available, so WebSockets requests can potentially end up at different container instances, due to built-in load balancing. You need to synchronize data between container instances to solve this problem.
Note that WebSockets on Cloud Run are also supported if you are using Cloud Load Balancing.
Deploying a sample WebSockets application
You can Deploy a sample whiteboard application implemented using WebSockets to Cloud Run.
To deploy this sample application manually, you can follow these steps:
Clone the Socket.IO repository locally using git command-line tool:
git clone https://github.com/socketio/socket.io.git
Navigate into the sample application directory:
Deploy a new Cloud Run service by building the application from source code using the
gcloud beta run deploy whiteboard --allow-unauthenticated --source=.
After the service is deployed, open two separate browser tabs and navigate to the service URL. Anything you draw in one tab should propagate to the other tab (and vice versa) since the clients are connected to the same container instance over WebSockets.
The most difficult part of creating WebSockets applications on Cloud Run is synchronizing data between multiple Cloud Run container instances. This is difficult because of the autoscaling and stateless nature of container instances, and because of the limits for concurrency and request timeouts.
Handling request timeouts and client reconnects
WebSockets requests are treated as long-running HTTP requests in Cloud Run. They are subject to request timeouts (currently up to 60 minutes and defaults to 5 minutes) even if your application server does not enforce any timeouts.
Accordingly, if the client keeps the connection open longer than the required timeout configured for the Cloud Run service, the client will be disconnected when the request times out.
Therefore, WebSockets clients connecting to Cloud Run should handle reconnecting to the server if the request times out or the server disconnects. You can achieve this in browser-based clients by using libraries such as reconnecting-websocket or by handling "disconnect" events if you are using the SocketIO library.
Billing incurred when using WebSockets
A Cloud Run instance that has at least one open WebSocket connection is considered active and is therefore billed.
WebSockets applications are typically designed to handle many connections simultaneously. Since Cloud Run supports concurrent connections (up to 250 per container), Google recommends that you increase the maximum concurrency setting for your container to a higher value than the default if your service is able to handle the load with given resources.
About sticky sessions (session affinity)
Because WebSockets connections are stateful, the client will stay connected to the same container on Cloud Run throughout the lifespan of the connection. This naturally offers a session stickiness within the context of a WebSockets session.
However, because Cloud Run automatically scales container instances and load balances every request between available container instances, it does not offer any session stickiness. Therefore, subsequent requests from the same client will be load balanced randomly between container instances and may be routed to another container instance.
Due to load balancing between container instances, clients connecting to your Cloud Run service may end up being serviced by different container instances that do not coordinate or share data. To mitigate this, you need to use an external data storage to synchronize state between Cloud Run instances, which is explained in the next section.
Synchronizing data between container instances
Due to the stateless and autoscaling nature of Cloud Run container instances, clients connecting to a Cloud Run service might not receive the same data from the WebSockets connection.
For example, if you are building a chatroom application using WebSockets and
set your maximum concurrency setting to
250, when more than
users connect to this service at the same time, they will be served by different
container instances, and therefore, they will not be able to see the same
messages in the chatroom.
To synchronize data between your Cloud Run container instances, such as receiving the messages posted to a chatroom from on all instances, you need an external data storage system, such as a database or a message queue.
If you use an external database such as Cloud SQL, you can send messages to the database and poll from the database periodically. However, note that Cloud Run instances do not have CPU when the container is not handling any requests. If your service primarily handles WebSockets requests, then the container will have CPU allocated as long as there is at least one client connected to it.
Using message queues will work better to synchronize data between Cloud Run containers in real-time, because the external message queues cannot address each container instance to "push" data. Your applications needs to "pull" new messages from the message queue by establishing a connection to the message queue.
Google recommends that you use external message queue systems such as Redis Pub/Sub (Memorystore) or Firestore real-time updates that can deliver updates to all instances over connections initiated by the container instance.
Designing a chatroom service
To design a real-time WebSockets-based chatroom service on Cloud Run, you need to synchronize data between all container instances that the clients connect to. Using an external message queue and having a pull-based subscription works with Cloud Run application model.
In this Redis-based architecture, each Cloud Run container instance
establishes a long-running connection to the Redis channel that contains the
received chat messages (using the
SUBSCRIBE command). Once the container
instances receive a new message on the channel, they can send it to their
clients over WebSockets in real-time.
Similarly, when a client posts a message to the chatroom using WebSockets, the
container instance that receives the message publishes the message to the Redis
channel (using the
PUBLISH command), and
other container instances that are subscribed to this channel will receive this