This article discusses considerations for moving computer gaming servers to Google Cloud Platform (GCP). Most online games today feature dedicated game server processes that run on machines — virtual or physical. This guide is for game developers and operations teams familiar with running dedicated game server processes, either on-premises or in the cloud. Although there are a number of considerations when migrating game servers to GCP, the increased flexibility can provide substantial benefits, including an advantageous billing model, industry-leading global infrastructure, and the latest cloud technologies. This guide aims to help get you started with your migration. The game industry doesn't have standard terminology, so for the purposes of this article:
- Machine refers to the physical or virtual machine the game-server processes run on.
- Game server refers to the game-server process. Multiple game-server processes may run simultaneously on one machine.
- Instance refers to a single game-server process.
Reasons to run game servers on GCP
More and more game studios are moving their dedicated game servers to the cloud to get off the treadmill of attempting to match their hardware purchases to the fundamentally difficult-to-estimate launch-window player base. The ability to seamlessly handle bursts of activity and only pay for what you use can remove substantial risk from your launch. Some benefits include:
- Google Compute Engine VMs are billed by the minute, not by the hour. Sustained usage discounts are automatically applied for VMs you keep running — there is no need to reserve capacity and take on the associated cost risk.
- Custom VM shapes allow you to pay only for the amount of CPU and memory your dedicated game servers actually use.
- Compute Engine VMs start very quickly — from seconds (Linux) to a few minutes (Windows).
- Wide regional availability allows you to put your game servers close to your customers. Default global network space allows those servers to easily communicate back to your existing services without additional work.
- You can provision one-off environments for development, QA testing, or events in minutes and discard them as soon as they are no longer needed, without any commitment and without impacting other environments.
- You can easily configure builds hosted in Google Cloud Storage as a CDN origin or distribute builds directly out of buckets.
- Snapshots of build artifacts on separate disks can allow for easy OS upgrades, as well as promoting builds to production by using symlinks.
Using projects to segregate environments
It's not uncommon for development, testing, staging, and public environments to be separated into different projects. Projects are lightweight and can be easily created and destroyed. Another popular pattern is to create a project for a testing environment, deploy a build, have QA run smoke tests, and destroy the project when the build is ready to be promoted or rejected. This also guarantees separation of resources and quotas. For example, a bad build in testing can not impact the production services by over consuming resources. Also, the act of destroying a project always deletes all the resources in it, ensuring you won't have forgotten test or dev environments showing up on your bill.
Getting builds to the cloud
Uploading builds to Cloud Storage incurs only the per-GB storage cost. Once there, it's easy to use this upload as a CDN origin, or to distribute builds to other studios or contractors. You can set Time to Live for uploaded objects using Object Lifecycle Management to keep your administration overhead manageable and your costs constrained.
It's common to host assets and binaries in the cloud using an object store like Cloud Storage or block storage such as persistent disks. Any existing processes you might have that use Amazon Web Service's S3 can be easily modified or extended to use Cloud Storage, as it is API compatible. Although storing assets on Cloud Storage is a great way to distribute builds to customers, CDNs, and even remote offices, copying from Cloud Storage to your game server VMs can introduce additional load time, so it isn't a recommended approach. Many customers with a large number of dedicated game servers will already be familiar with the pattern of 'baking' disk images that already have all the necessary libraries, assets, and binaries required to start up a fresh dedicated game server VM. This strategy is as valid on GCP as it is on premises or other clouds. However, we recommend pairing snapshots with read-only disks, as detailed in the following section, which has significant advantages.
Snapshots with read-only disks strategy
In most games, the OS and game server build can be changed independently. We recommend that you don't put the game build on the OS disk, but on a separate persistent disk, with necessary build artifacts soft or symbolic- linked to their expected directory. This setup creates a clean workflow for distributing multiple builds to a VM. Simply place each build on its own disk, attach all the disks to the VM, and update the symlinks when you're ready to change game server versions. Previous builds on separate disks can be left connected until you are confident that the new version is stable, allowing for quick rollbacks. You can attach persistent disks to multiple VMs, which saves costs and eliminates the need to distribute assets to all the VMs.
When implementing this approach as part of a game development pipeline, we
recommend that you configure your build system to take the necessary steps to
create the disk with all the artifact files in the appropriate directory
structure. For example, you could use a simple script that runs
commands, or a GCP-specific plugin for your build system of choice. We also
recommend that you create multiple copies of the disk, and have VMs connect to
these copies in a balanced manner, both for throughput considerations and to
manage failure risk. Note that you can make multiple persistent disks from a
single snapshot at the same time, which makes an effective way to quickly
generate multiple disk copies of the same game server build.
GCP runs on Google's custom-built, private fiber network, providing very low latency between major metropolitan points of presence (PoPs) and GCP data centers. GCP's networking model allows you to keep vital gameplay-related traffic on the Google network between the region hosting your servers and the Google PoP nearest to your customer's ISP, as much as possible. Contrast this with the more common "hot-potato" routing strategies provided by other vendors, which try to route your traffic to the public Internet as soon as they leave the VMs. Google's network can deliver higher reliability and more predictable latency for your players.
In addition, GCP defaults to a global network space. There is no need to
"connect" different regions or zones by using VPNs or other networking software.
If you start a VM in the
us-central1-b zone and another in
and then configure them to use the same network, they can reach each other with
no additional configuration. All traffic between the zones stays on Google's
private network, incurring no latency from extra hops caused by BGP across the
Networking between projects
By default, each GCP project you create represents its own isolated, global network. VMs won't be able to communicate with one another if they are in separate projects without going over the public Internet. Although there are a number of ways to establish cross-project networking, we recommend that you use a single project to run servers and services that you expect to intercommunicate in private network space.
GCP networking for game servers
The most common networking pattern for dedicated game servers on GCP is to create a "game" network using the Google Cloud Platform Console. Configure this network with the appropriate game ports open to Internet traffic, as well as any required internal ports for health checks, monitoring, or platform service communication. Then, specify this network when creating VMs that host dedicated game servers.
After game servers are migrated to GCP, you might find benefits in further optimizing your server builds to use GCP infrastructure.
Clock speeds are fairly constant per vCPU, but Google has cloud regions that contain the last several Intel CPU architectures. Some game customers have found speed improvements of up to 20% on Linux when compiling with architecture-specific flags and running their servers in regions that offer CPUs from those families. This consistency of vCPU speed also lends itself to a scale-out instead of scale- up approach to CPU-intensive tasks. When possible, use worker threads and parallelize your code to allow it to use several CPUs, instead of needing more cycles from a few CPUs.
Although there is a charge for traffic egress, there is no charge for ingress. Consider asymmetric communication patterns that use a higher update rate for packets from the client than packets from the server. Many times the effect of a lower server-update rate can be mitigated with interpolation or extrapolation. A suggested primer on these topics is this game networking series from Glenn Fieldler's site, Gaffer on Games. In addition, if your client and server's marshalling/serialization code can be made sufficiently efficient, consider using packet compression to keep your packet size small.
Analytics events and logging optimization
Consider establishing a high-volume analytics event and logging path to Google BigQuery or Cloud Storage, and having the ability to turn this on and off for each individual game server process. This approach allows you to quickly investigate reported exploits, analyze gameplay for balance, or generate ML training data. Data stored in either service is very cheap ($102.40/month for 5 TB of storage currently, with a robust history of price drops) and can be set to automatically expire after a given date, giving you easy control over costs and data volume.
Externalization of game state
Our overview of cloud game infrastructure briefly goes into the potential advantages of externalizing state. It's worth considering which portions of your game state can be externalized. Some amount of externalized state can be realized today, and more cloud technologies are coming every day that will improve this ability. Externalized state can enable many interesting features such as:
Allow migration of game server processes between VMs, during play, with minimal interruption. This can be used in many scenarios including:
- Co-locating game servers or services that establish high amounts intra- server communication in the same region or even on the same VM, without trying to predict user's traffic patterns ahead of time.
- Segregating processes that have high resource usage.
- Seamlessly live-migrating processes away from VMs that need to be shut down for updates, or removed to reduce usage after peak demand has passed.
Allow multiple processes to work on different parts of the simulation, and share the updates between each other in a manner similar sharing memory, only on a much larger scale.
Allow serialization of state to be saved to disk or shared to another game server, which can open new possibilities for eSports, such as:
- Server-captured authoritative replays with a chain of custody.
- "Mirrored" dedicated game servers, enabling very large numbers of live spectators in game clients.
- Overview of Cloud Game Infrastructure
- Architecture: Optimizing Large-Scale Ingestion of Analytics Events and Logs
- Share a persistent disk between multiple instances
- Creating a persistent disk from a snapshots
- Try out other Google Cloud Platform features for yourself. Have a look at our tutorials.