This solution provides an overview of common components and design patterns used to host game infrastructure on cloud platforms.
Video games have evolved over the last several decades into a thriving entertainment business. With the broadband Internet becoming widespread, one of the key factors in the growth of games has been online play.
Online play comes in several forms, such as session-based multiplayer matches, massively multiplayer virtual worlds, and intertwined single-player experiences.
In the past, games using a client-server model required the purchase and maintenance of dedicated on-premises or co-located servers to run the online infrastructure, something only large studios and publishers could afford. In addition, extensive projections and capacity planning were required to meet customer demand without overspending on fixed hardware. With today's cloud-based compute resources, game developers and publishers of any size can request and receive any resources on demand, avoiding costly up-front monetary outlays and the dangers of over or under provisioning hardware.
The following diagram illustrates the online portion of a gaming architecture.
The frontend components of the gaming architecture include:
- Game Platform services that provide extra-game functionality.
- Dedicated game servers that host the game.
The backend components of the gaming architecture include:
- Game state, persisted in the system of record and typically stored in the game database.
- Analytics stack that stores and queries analytics and gameplay events.
These components can be hosted on a variety of environments: on-premises, private or public cloud, or even a fully managed solution. As long as the system meets your latency requirements for communication between the components and end users, any of these can work.
The frontend provides interfaces that clients can interact with, either directly or through a load balancing layer.
For example, in a session-based first person shooter, the frontend typically includes a matchmaking service. This service distributes connection information for dedicated game server instances to clients:
- A client sends a request to the matchmaking service
- The matchmaking service sends connection information to client
- The client can then connect directly to the dedicated game server instance using User Datagram Protocol (UDP).
Frontend services don't have to be used exclusively by external clients. It is common for frontend services to communicate with each other and with the backend.
Since frontend services are available over the Internet, however, they can have additional exposure to attacks. You should consider hardening your frontend service against denial-of-service attacks and malformed packets to help address these security and reliability concerns. In comparison, backend services are generally only accessible to trusted code, and might therefore be harder to attack.
Game platform services
Common names for this component are platform services or online services. Platform services provide interfaces for essential meta-game functions, such as allowing players to join the same dedicated game server instance, or holding the "friend list" social graph for your game. It's common for the platform your game is running on, such as Steam, Xbox Live, or Google Play Games, to provide these services:
- Leaderboard and match history
- Online lobby
- Inventory management
- Cross-platform unlock
- Social identity mapping
Game platform services evolved in a similar way compared to web services:
In the early 2000s, a typical suite of platform services would run as a monolithic service, frequently implemented as a singleton. Even when federated, this pattern is not recommended for cloud deployments.
The now-familiar Service oriented architecture pattern (SOA) became popular in the mid-2000s, as the industry changed the various services to be independently scalable. In addition, the services could now be accessed not only by game clients and servers, but also by web services and, eventually, smartphone apps.
The last half-decade has seen many developers adopting the microservices approach championed by fast-scaling web companies. Many of the fundamental challenges of platform services and web applications are the same, such as enabling fast development cycles and running highly distributed services all over the world. Microservices can help address these problems and are an excellent choice when designing applications to run on cloud platforms.
In addition, there are now many hosted or managed services that either provide either a way to build platform services, or fully managed platform services.
Backend platform services
Although most platform services are accessed by external clients, sometimes it makes sense for a platform service to be accessed only by other portions of your online infrastructure, for example, a non-exposed competitive player ranking service. Although such backend platform services typically lack an external network route and IP address, they follow the same design practices as frontend platform services.
Google Cloud Platform platform services solutions
The following solutions provide more information about how to build backend services on Cloud Platform.
- Platform services for mobile based on Google App Engine and Firebase
- Designing APIs to communicate between microservices on App Engine
Dedicated game server
Dedicated game servers provide the game logic. To minimize latency perceived by the user, client game apps typically communicate directly with the dedicated game servers. This makes them part of the frontend service architecture.
The industry doesn't have standard terminology, so for the purposes of this article:
- machine refers to the physical or virtual machine the game-server processes run on.
- game server refers to the game-server process. Multiple game-server processes may run simultaneously on one machine.
- instance refers to a single game-server process.
Types of dedicated game servers
The term dedicated can be misleading for today's backend game servers. In its original context, dedicated referred to game servers that ran on dedicated hardware in a 1:1 ratio. Today, most game publishers manage multiple game-server processes running concurrently on a machine. Despite the fact that these processes now rarely have entire machines dedicated to them, the term dedicated game server is still in frequent use in the gaming industry.
Dedicated game servers are as varied as types of games they run. A few of high-level game server categories are discussed in the following section.
Until recently, almost every dedicated game server for a commercially shipped product was part of the frontend for a real-time simulation game. Real-time simulation game servers have historically pushed the limits of vertical scaling. The most demanding games have moved to manual-horizontal scaling tactics such as running multiple server processes per machine, or geographically sharding the world. UDP communication with custom flow control, reliability, and congestion avoidance is the dominant networking paradigm.
Most real-time simulation game servers are implemented as an endless loop, where the goal is to finish the loop within a given time budget. Typical time budgets are 16 or 33 milliseconds which yields a 60 or 30 times-per-second state update rate, respectively. Update rate is also referred to as frame rate or tick rate. Although the server is updating its simulation at a high frequency, it is not uncommon for the server to communicate state updates to clients only after multiple updates have passed. This keeps the network bandwidth requirements reasonable. The effects of updating less frequently can be mitigated using strategies such as lag compensation, interpolation, and extrapolation.
All of this means that real-time simulation game servers run latency-sensitive, compute- and bandwidth-intensive workloads requiring careful consideration of the game server design and the compute platforms it runs on.
Session- or match-based games
Games where the servers are designed to run discrete sessions are very common today. Typical examples are the multiplayer sessions of first-person shooter (FPS) games such as Call of Duty™, Overwatch™, or Titanfall™ or multiplayer online battle arena (MOBA) games such as Dota 2™ or Vainglory™. These games have servers that require twitch-fast gameplay and detailed game-state calculations, frequently with threads devoted to AI or physics simulation.
Massively multiplayer persistent worlds
Almost two decades ago, Ultima Online™ paved the way for a huge explosion of massively multiplayer online (MMO) games. Today's most popular MMOs, such as World of Warcraft™ and Guild Wars™, are characterized by complicated server designs with an ever-evolving set of features.
Complex issues are common in MMO game servers, such as passing game entities between server instances, sharding or phasing the game world, and physically co-locating the instances simulating adjacent game world areas. The compute and memory requirements to calculate state updates for a persistent world containing hundreds or thousands of players can lead to solutions such as the time dilation of Eve Online™.
Request/response based servers
Technically, all dedicated game servers are based on a series of requests and responses. In particular, however, mobile game servers, without a critical demand for real-time communication, have adopted HTTP request and response semantics like those used in web hosting.
The challenges for request/response game servers are the same as those for any web service, including:
- Keeping the response time of the game server as fast as possible.
- Distributing the game servers globally to reduce latency and add redundancy.
- Validating the game client's actions on the server to protect against exploits or cheating.
- Hardening the game servers against denial of service and other attacks.
- Implementing exponential delay for communication retries on the client side.
- Creating “sticky” sessions or externalizing process state.
The strengths of request/response game servers, such as compact communication semantics and ease of retries after an application or network failure, work well for turn-based and mobile games.
Externalizing the game world state
Increasingly, players expect zero game downtime. This means you need to protect their experience from issues affecting individual server instances. To help do so, a game should persist the player state outside of a single game-server process. The advantages are many, such as resilience against crashed server processes and the ability to effectively load-balance.
Unfortunately, simply using externalized state patterns popular in web services can be problematic for a number of reasons, including:
- The speed at which updates can be written to external state can be a challenge when you have many unique entities updating dozens of times per second. This is true even if you use memory-cached key-value stores such as Memcached or Redis.
- The tail-end latency of queries against external-state caches represents a large problem. It's difficult to meet state-update deadlines if 1%, or even 0.1%, of your queries have latency an order of magnitude larger than the update deadline.
- Determining which processes have read-only versus read-write authority over objects in your external state cache introduces complexity to the server model.
However, solving these problems has several beneficial side-effects. Successfully externalized state available to many processes with proper access management in place can greatly simplify the ability to calculate portions of the game state update in parallel. It is similarly advantageous for migrating entities between instances.
Cloud Platform dedicated game server solutions
The following articles describe how to run dedicated game servers on Cloud Platform.
- Dedicated Game Server Migration Guide
- Setting up a Minecraft server on Compute Engine
- Running Dedicated Game Servers in Kubernetes Engine
Backend services present interfaces only to other frontend and backend services. External clients can't directly communicate with a backend service. Backend services typically provide a way for you to store and access data, such as game state data in a database, or logging and analytics events in a data warehouse.
Among the scenarios that can cause players to quit playing your game and never return are non-working servers and the loss of player progress. Unfortunately, both are possible if you have a poorly designed database layer.
The database that holds the game-world state and player progression data could be considered the most critical piece of your game’s infrastructure.
You should evaluate the ability of the database to handle not only your expected workload, but also the workload required if your game becomes a massive success. A backend designed and tested for an estimated player base, but which suddenly receives an order of magnitude more load is unlikely to be able to serve anyone reliably. Failure to plan for unexpected success can cause your game to fail, as players may abandon your game when it becomes unplayable due to database issues.
Games are particularly vulnerable to this issue. Most businesses with a successful product can expect gradual, organic growth. But a typical game will see a large spike of initial interest followed by quick fall-off to a much lower amount of usage. If your game is a hit, an overtaxed database may have massive delays before saving user progress, or even fail to save the progress altogether. Being in a situation where you're forced to decide which features of your game are no longer going to support real-time updates is not a situation any game developer wants to be in, so plan your database resources carefully.
When designing a game database:
- Make an informed decision. Do not use a database during development because it's easy to test against and then allow it to become your production database without evaluating all the options. It is important to understand the type and frequency of the database access from your game at your expected player base, and at 10x those estimates. Then you can make an informed choice as to what backend can best handle those scenarios. Don't put yourself in a situation of trying to learn how to deal with a database crisis when the crisis hits.
- Don't assume one solution is the right solution. There is no rule that you can only run one type of database. Many successful games store account information and process in-game purchases using a relational database while keeping game state information in a separate NoSQL database. The NoSQL database is better at handling high-volume, low-latency workloads, while the relational database can provide guaranteed transactions.
- Back up your data. Regular and geographically distributed backups are important to recovering from database failure.
Many game development teams begin with a single relational database. When the data and traffic grows to the point where the database performance is no longer acceptable, a common first approach is to scale the database. Once scaling is no longer feasible, many developers implement a custom database service layer. In this layer, you can prioritize queries and cache results, both of which limit database access. By adding scaling and a database service layer you can produce a game backend that can handle huge numbers of players, but these methods can have some common issues:
- Scaling—Traditional relational databases focus on a scale-up (vertical scaling) approach. When planning a cloud-native game backend, however, it is strongly recommended to use a scale-out (horizontal scaling) approach instead, as the number of cores that can be present in a single VM will always be limited, while adding additional VMs to your cloud project is quite simple. Relational databases have patterns for horizontal scaling such as sharding, clustering, and tiered replicas, but they can be difficult to add to a running database without downtime. If there is any chance that your traffic or data will outgrow a single database, start with a small cluster. You don't want to have to learn how to scale your database when the crisis hits. Adding nodes to a cluster while it’s running isn't without challenges, but it is possible.
- Schema changes—Very few successful games launch with a database schema that lasts throughout the lifetime of the game. Players demand new features and content, and these additions require saving new types of data to the database. Early in your development process, you should determine how you’ll update your schema. Trying to update your schema after launching your game without an established process could result in unplanned downtime or even loss of player data.
- Administration—Scaling a running relational database and updating its schema are both complex operations. Automatically managed relational databases are common services of cloud platforms, but the adoption rate of automatically managed databases for game backends is currently low. This is because of the write-heavy workloads of game backends.
Non relational databases can provide the solution to operating at scale, especially with write-heavy workloads. However they require that you understand NoSQL data models, access patterns, and transactional guarantees.
There are many types of NoSQL databases, and those well-suited for storing game world state have the following features:
Scaling—They are designed with horizontal scaling in mind, and often use it by default. Resizing clusters is typically an operation that can be done without downtime, though sometimes there is some performance loss until the additional nodes are fully integrated.
Schema Changes—The schema is dynamic and enforced by the application layer. This is a huge advantage and means adding a new field for a new game feature can be trivial.
Administration—Most cloud providers offer at least one hosted or managed NoSQL data storage engine, and Google Cloud Platform offers several.
Google Cloud Platform game database solutions
- Using Cloud SQL Second Generation as a mobile game backend database
- Deploying MongoDB on Compute Engine
- How to Set Up PostgreSQL for High Availability and Replication with Hot Standby
Analytics has grown into an important component of modern games. Both online services and game clients can send analytics and telemetry events to a common collection point, where the events are stored in a database. They can then be queried by everyone from gameplay programmers and designers to business intelligence analysts and customer service representatives. As the complexity of the analytics that are being collected grows, so does the need to keep these events in a format that can be easily and quickly queried.
The last decade has seen a massive rise in the popularity of Apache™ Hadoop®, the open-source framework based on published work from Google. The expansion of the Hadoop ecosystem has increased the use of complex batch extract, transform, and load (ETL) operations to format and insert analytics events into a data warehouse. Use of MapReduce sped up the rate at which actionable results were delivered, and this speed in turn helped enable new, more compute-intensive analytics.
Meanwhile, the technologies available in the cloud have continued to evolve. Many of them are available as managed services that are quick to learn and require no dedicated operations staff. Google's latest streaming ETL paradigm provides a unified approach to both batch and stream processing, and is available both as a managed cloud service and as the open source project Apache Beam. Continued improvements in cloud data storage prices now make it possible to keep huge amounts of logs and analytics events in massive, managed, cloud databases that optimize the way that data is written and read. The latest query engines for these databases are capable of aggregating TB of data in seconds. For an example of this, see analyzing 50 billion Wikipedia pageviews in 5 seconds.
Google Cloud Platform game analytics solutions
- Building a Mobile Gaming Analytics Platform - a Reference Architecture
- Importing Firebase Analytics Data into Google BigQuery
Solutions for online games follow a common pattern: clients talk to a frontend of services and game servers, which communicate to a backend of analytics and state storage. You can run each of these components on-premises, in the cloud, or some a mixture of the two. For more in-depth patterns, see gaming solutions.
- Try out other Google Cloud Platform features for yourself. Have a look at our tutorials.