Building a WebSocket Chat service for Cloud Run tutorial

This tutorial shows how to create a multi-room, realtime chat service using WebSockets with a persistent connection for bidirectional communication. With WebSockets, both client and server can push messages to each other without polling the server for updates.

Although you can configure Cloud Run to use session affinity, this provides a best effort affinity, which means that any new request can still be potentially routed to a different instance. As a result, user messages in the chat service need to be synchronized across all instances, not just between the clients connected to one instance.

Design overview

This sample chat service uses a Memorystore for Redis instance to store and synchronize user messages across all instances. Redis uses a Pub/Sub mechanism, not to be confused with the product Cloud Pub/Sub, to push data to subscribed clients connected to any instance, to eliminate HTTP polling for updates.

However, even with push updates, any instance that is spun up will only receive new messages pushed to the container. To load prior messages, message history would need to be stored and retrieved from a persistent storage solution. This sample uses Redis's conventional functionality of an object store to cache and retrieve message history.

The Redis instance is protected from the internet using private IPs with access controlled and limited to services running on the same Virtual Private Network as the Redis instance. We recommend that you use Direct VPC egress.

Limitations

This tutorial does not show end user authentication or session caching. To learn more about end user authentication, refer to the Cloud Run tutorial for end user authentication.
This tutorial does not implement a database such as Firestore for indefinite storage and retrieval of chat message history.
Additional elements are needed for this sample service to be production ready. A Standard Tier Redis instance is recommended to provide High Availability using replication and automatic failover.

Objectives

Write, build, and deploy a Cloud Run service that uses WebSockets.
Connect to a Memorystore for Redis instance to publish and subscribe to new messages across instances.
Connect the Cloud Run service with Memorystore using Direct VPC egress.

Costs

In this document, you use the following billable components of Google Cloud:

To generate a cost estimate based on your projected usage, use the pricing calculator.

New Google Cloud users might be eligible for a free trial.

Before you begin

Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Roles required to select or create a project

Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
Create a project: To create a project, you need the Project Creator (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

Go to project selector

Verify that billing is enabled for your Google Cloud project.

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Roles required to select or create a project

Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
Create a project: To create a project, you need the Project Creator (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

Go to project selector

Verify that billing is enabled for your Google Cloud project.

Enable the Cloud Run, Memorystore for Redis, Artifact Registry, and Cloud Build APIs.
Roles required to enable APIs
To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains the serviceusage.services.enable permission. Learn how to grant roles.
Enable the APIs

Install and initialize the gcloud CLI.

Required roles

To get the permissions that you need to complete the tutorial, ask your administrator to grant you the following IAM roles on your project:

Artifact Registry Reader (roles/artifactregistry.reader)
Cloud Build Editor (roles/cloudbuild.builds.editor)
Cloud Memorystore Redis Admin (roles/redis.admin)
Cloud Run Admin (roles/run.admin)
Create Service Accounts (roles/iam.serviceAccountCreator)
Project IAM Admin (roles/resourcemanager.projectIamAdmin)
Service Account Admin (roles/iam.serviceAccountAdmin)
Service Usage Consumer (roles/serviceusage.serviceUsageConsumer)

For more information about granting roles, see Manage access to projects, folders, and organizations.

You might also be able to get the required permissions through custom roles or other predefined roles.

Setting up `gcloud` defaults

To configure gcloud with defaults for your Cloud Run service:

Set your default project:
```
gcloud config set project PROJECT_ID
```
Replace PROJECT_ID with the name of the project you created for this tutorial.
Configure gcloud for your chosen region:
```
gcloud config set run/region REGION
```
Replace REGION with the supported Cloud Run region of your choice.

Cloud Run locations

Cloud Run is regional, which means the infrastructure that runs your Cloud Run services is located in a specific region and is managed by Google to be redundantly available across all the zones within that region.

Meeting your latency, availability, or durability requirements are primary factors for selecting the region where your Cloud Run services are run. You can generally select the region nearest to your users but you should consider the location of the other Google Cloud products that are used by your Cloud Run service. Using Google Cloud products together across multiple locations can affect your service's latency as well as cost.

Cloud Run is available in the following regions:

Subject to Tier 1 pricing

asia-east1 (Taiwan)
asia-northeast1 (Tokyo)
asia-northeast2 (Osaka)
asia-south1 (Mumbai, India)
europe-north1 (Finland) Low CO₂
europe-north2 (Stockholm) Low CO₂
europe-southwest1 (Madrid) Low CO₂
europe-west1 (Belgium) Low CO₂
europe-west4 (Netherlands) Low CO₂
europe-west8 (Milan)
europe-west9 (Paris) Low CO₂
me-west1 (Tel Aviv)
northamerica-south1 (Mexico)
us-central1 (Iowa) Low CO₂
us-east1 (South Carolina)
us-east4 (Northern Virginia)
us-east5 (Columbus)
us-south1 (Dallas) Low CO₂
us-west1 (Oregon) Low CO₂

Subject to Tier 2 pricing

africa-south1 (Johannesburg)
asia-east2 (Hong Kong)
asia-northeast3 (Seoul, South Korea)
asia-southeast1 (Singapore)
asia-southeast2 (Jakarta)
asia-south2 (Delhi, India)
australia-southeast1 (Sydney)
australia-southeast2 (Melbourne)
europe-central2 (Warsaw, Poland)
europe-west10 (Berlin)
europe-west12 (Turin)
europe-west2 (London, UK) Low CO₂
europe-west3 (Frankfurt, Germany)
europe-west6 (Zurich, Switzerland) Low CO₂
me-central1 (Doha)
me-central2 (Dammam)
northamerica-northeast1 (Montreal) Low CO₂
northamerica-northeast2 (Toronto) Low CO₂
southamerica-east1 (Sao Paulo, Brazil) Low CO₂
southamerica-west1 (Santiago, Chile) Low CO₂
us-west2 (Los Angeles)
us-west3 (Salt Lake City)
us-west4 (Las Vegas)

If you already created a Cloud Run service, you can view the region in the Cloud Run dashboard in the Google Cloud console.

Retrieving the code sample

To retrieve the code sample for use:

Clone the sample repository to your local machine:
Node.js
```
git clone https://github.com/GoogleCloudPlatform/nodejs-docs-samples.git
```
Alternatively, you can download the sample as a zip file and extract it.
Change to the directory that contains the Cloud Run sample code:
Node.js
```
cd nodejs-docs-samples/run/websockets/
```

Understanding the code

Socket.io is a library that enables real time, bidirectional communication between the browser and server. Although Socket.io is not a WebSocket implementation, it does wrap the functionality to provide a simpler API for multiple communication protocols with additional features such as improved reliability, automatic reconnection, and broadcasting to all or a subset of clients.

Client-side integration

<script src="/socket.io/socket.io.js"></script>

The client instantiates a new Socket instance for every connection. Because this sample is server side rendered the server URL does not need to be defined. The socket instance can emit and listen to events.

// Initialize Socket.io
const socket = io('', {
  transports: ['websocket'],
});

// Emit "sendMessage" event with message
socket.emit('sendMessage', msg, error => {
  if (error) {
    console.error(error);
  } else {
    // Clear message
    $('#msg').val('');
  }
});

// Listen for new messages
socket.on('message', msg => {
  log(msg.user, msg.text);
});

// Listen for notifications
socket.on('notification', msg => {
  log(msg.title, msg.description);
});

// Listen connect event
socket.on('connect', () => {
  console.log('connected');
});

Server-side integration

On the server side, the Socket.io server is initialized and attached to the HTTP server. Similar to the client side, once the Socket.io server makes a connection to the client, a socket instance is created for every connection which can be used to emit and listen to messages. Socket.io also provides an interface for creating "rooms" or an arbitrary channel that sockets can join and leave.

// Initialize Socket.io
const server = require('http').Server(app);
const io = require('socket.io')(server);

const {createAdapter} = require('@socket.io/redis-adapter');
// Replace in-memory adapter with Redis
const subClient = redisClient.duplicate();
io.adapter(createAdapter(redisClient, subClient));
// Add error handlers
redisClient.on('error', err => {
  console.error(err.message);
});

subClient.on('error', err => {
  console.error(err.message);
});

// Listen for new connection
io.on('connection', socket => {
  // Add listener for "signin" event
  socket.on('signin', async ({user, room}, callback) => {
    try {
      // Record socket ID to user's name and chat room
      addUser(socket.id, user, room);
      // Call join to subscribe the socket to a given channel
      socket.join(room);
      // Emit notification event
      socket.in(room).emit('notification', {
        title: "Someone's here",
        description: `${user} just entered the room`,
      });
      // Retrieve room's message history or return null
      const messages = await getRoomFromCache(room);
      // Use the callback to respond with the room's message history
      // Callbacks are more commonly used for event listeners than promises
      callback(null, messages);
    } catch (err) {
      callback(err, null);
    }
  });

  // Add listener for "updateSocketId" event
  socket.on('updateSocketId', async ({user, room}) => {
    try {
      addUser(socket.id, user, room);
      socket.join(room);
    } catch (err) {
      console.error(err);
    }
  });

  // Add listener for "sendMessage" event
  socket.on('sendMessage', (message, callback) => {
    // Retrieve user's name and chat room  from socket ID
    const {user, room} = getUser(socket.id);
    if (room) {
      const msg = {user, text: message};
      // Push message to clients in chat room
      io.in(room).emit('message', msg);
      addMessageToCache(room, msg);
      callback();
    } else {
      callback('User session not found.');
    }
  });

  // Add listener for disconnection
  socket.on('disconnect', () => {
    // Remove socket ID from list
    const {user, room} = deleteUser(socket.id);
    if (user) {
      io.in(room).emit('notification', {
        title: 'Someone just left',
        description: `${user} just left the room`,
      });
    }
  });
});

Socket.io also provides a Redis adapter to broadcast events to all clients regardless of which server is serving the socket. Socket.io only uses Redis's Pub/Sub mechanism and does not store any data.

const {createAdapter} = require('@socket.io/redis-adapter');
// Replace in-memory adapter with Redis
const subClient = redisClient.duplicate();
io.adapter(createAdapter(redisClient, subClient));

Socket.io's Redis adapter can reuse the Redis client used to store the room's message history. Each container will create a connection to the Redis instance and Cloud Run can create a large number of instances. This is well under the 65,000 connections that Redis can support.

Reconnection

Cloud Run has a maximum timeout of 60 minutes. So you need to add reconnection logic for possible timeouts. In some cases, Socket.io automatically attempts to reconnect after disconnection or connection error events. There is no guarantee that the client will reconnect to the same instance.

// Listen for reconnect event
socket.io.on('reconnect', () => {
  console.log('reconnected');
  // Emit "updateSocketId" event to update the recorded socket ID with user and room
  socket.emit('updateSocketId', {user, room}, error => {
    if (error) {
      console.error(error);
    }
  });
});

// Add listener for "updateSocketId" event
socket.on('updateSocketId', async ({user, room}) => {
  try {
    addUser(socket.id, user, room);
    socket.join(room);
  } catch (err) {
    console.error(err);
  }
});

Instances will persist if there is an active connection until all requests close or time out. Even if you use Cloud Run session affinity, it is possible for new requests to be load balanced to active containers, which allows containers to scale in. If you are concerned about large numbers of containers persisting after a spike in traffic, you can lower the maximum timeout value, so that unused sockets are cleaned up more frequently.

Shipping the service

Create a Memorystore for Redis instance:
```
gcloud redis instances create INSTANCE_ID --size=1 --region=REGION
```
Replace the following:
- INSTANCE_ID: the name for the instance—for example, my-redis-instance.
- REGION_ID: the region for all your resources and services—for example, europe-west1.
Instance will be automatically allocated an IP range from the default service network range. This tutorial uses 1GB of memory for the local cache of messages in the Redis instance. Learn more about Determining the initial size of a Memorystore instance for your use case.
Define an environment variable with the IP address of your Redis instance's authorized network:
```
 export REDISHOST=$(gcloud redis instances describe INSTANCE_ID --region REGION --format "value(host)")
```
Note: This tutorial uses the default value for the Redis port, 6379.

Create a service account to serve as the service identity. By default this has no privileges other than project membership.

gcloud iam service-accounts create chat-identity
gcloud projects add-iam-policy-binding PROJECT_ID \
--member=serviceAccount:chat-identity@PROJECT_ID.iam.gserviceaccount.com \
--role=roles/serviceusage.serviceUsageConsumer

Find the name of your Redis instance-authorized VPC network by running the following command:
```
  gcloud redis instances describe INSTANCE_ID --region REGION --format "value(authorizedNetwork)"
```
Replace the following:
- INSTANCE_ID: the name for the instance—for example, my-redis-instance.
- REGION_ID: the region for all your resources and services—for example, europe-west1.
Make note of the VPC network name.
Build and deploy the container image to Cloud Run:
```
gcloud run deploy chat-app --source . \
    --allow-unauthenticated \
    --timeout 3600 \
    --service-account chat-identity \
    --network NETWORK \
    --subnet SUBNET \
    --update-env-vars REDISHOST=$REDISHOST
```
Replace the following:
- NETWORK is the name of the authorized VPC network that your Redis instance is attached to.
- SUBNET is the name of your subnet. The subnet must be /26 or larger. Direct VPC egress supports IPv4 ranges RFC 1918, RFC 6598, and Class E.
Respond to any prompts to install required APIs by responding y when prompted. You only need to do this once for a project. Respond to other prompts by supplying the platform and region, if you haven't set defaults for these as described in the setup page. Learn more about Deploying from source code.

Trying it out

To try out the complete service:

Navigate your browser to the URL provided by the deployment step.
Add your name and a chat room to sign in.
Send a message to the room!

If you choose to continue developing these services, remember that they have restricted Identity and Access Management (IAM) access to the rest of Google Cloud and will need to be given additional IAM roles to access many other services.

Clean up

To avoid additional charges to your Google Cloud account, delete all the resources you deployed with this tutorial.

Delete the project

If you created a new project for this tutorial, delete the project. If you used an existing project and need to keep it without the changes you added in this tutorial, delete resources that you created for the tutorial.

The easiest way to eliminate billing is to delete the project that you created for the tutorial.

To delete the project:

In the Google Cloud console, go to the Manage resources page.
Go to Manage resources
In the project list, select the project that you want to delete, and then click Delete.
In the dialog, type the project ID, and then click Shut down to delete the project.

Delete tutorial resources

Delete the Cloud Run service you deployed in this tutorial. Cloud Run services don't incur costs until they receive requests.

To delete your Cloud Run service, run the following command:
```
gcloud run services delete SERVICE-NAME
```
Replace SERVICE-NAME with the name of your service.

You can also delete Cloud Run services from the Google Cloud console.
Remove the gcloud default region configuration you added during tutorial setup:
```
 gcloud config unset run/region
```
Remove the project configuration:
```
 gcloud config unset project
```
Delete other Google Cloud resources created in this tutorial:
- Delete the service container image named gcr.io/PROJECT_ID/chat-app from Artifact Registry
- Delete the service account chat-identity@PROJECT_ID.iam.gserviceaccount.com
- Delete the Memorystore for Redis instance

What's next

Learn more about how Socket.io works and more advanced usage.
Dive deeper into Direct VPC egress with a VPC network.
Review best practices for Memorystore and for Using WebSockets on Cloud Run.

Building a WebSocket Chat service for Cloud Run tutorial

Design overview

Limitations

Objectives

Costs

Before you begin

Required roles

Setting up gcloud defaults

Cloud Run locations

Subject to Tier 1 pricing

Subject to Tier 2 pricing

Retrieving the code sample

Node.js

Node.js

Understanding the code

Client-side integration

Server-side integration

Reconnection

Shipping the service

Trying it out

Clean up

Delete the project

Delete tutorial resources

What's next

Setting up `gcloud` defaults