Best practices for using Cloud Spanner as a gaming database

This document describes best practices for using Cloud Spanner as the primary backend database for game state storage. You can use Cloud Spanner in place of common databases to store player authentication data and inventory data. This document is intended for game backend engineers working on long-term state storage, and game infrastructure operators and admins who support those systems and are interested in hosting their backend database on Google Cloud Platform (GCP).

Multiplayer and online games have evolved to require increasingly complex database structures for tracking player entitlements, state, and inventory data. Growing player bases and increasing game complexity have led to database solutions that are a challenge to scale and manage, frequently requiring the use of sharding or clustering. Tracking valuable in-game items or critical player progress typically requires transactions and is challenging to work around in many types of distributed databases.

Cloud Spanner is the first scalable, enterprise-grade, globally distributed, and strongly consistent database service built for the cloud to combine the benefits of relational database structure with non-relational horizontal scale. Many game companies have found it to be well-suited to replace both game state and authentication databases in production-scale systems. You can scale for additional performance or storage by using the GCP Console to add nodes. Cloud Spanner can transparently handle global replication with strong consistency, eliminating your need to manage regional replicas.

This best practices document discusses the following:

  • Important Cloud Spanner concepts and differences from databases commonly used in games.
  • When Cloud Spanner is the right database for your game.
  • Patterns to avoid when using Cloud Spanner for games.
  • Designing your database operations with Cloud Spanner as your game's database.
  • Modeling your data and creating a schema to get the best performance with Cloud Spanner.

Terminology

Entitlements
Games, expansions, or in-app purchases belonging to a player.
Personally identifiable information (PII)
In games, information that typically includes email address and payment account information, such as a credit card number and billing address. In some markets, this information might include a national ID number.
Game database (game DB)
A database that holds player progress and inventory for a game.
Authentication database (auth DB)
A database that includes player entitlements and the PII that the players use when making a purchase. The auth DB is also known as the account DB or player DB. This database is sometimes combined with the game DB, but they are frequently separated in studios or publishers that have multiple titles.
Transaction
A database transaction—a set of write operations that have an all-or-nothing effect. Either the transaction succeeds and all updates take effect, or the database is returned to a state that doesn't include any of the updates of the transaction. In games, database transactions are most critical when processing payments, and when assigning the ownership of valuable in-game inventory or currency.
Relational database management system (RDBMS)
A database system based on tables and rows that reference one another. SQL Server, MySQL, and (less commonly) Oracle® are examples of relational databases used in games. These are frequently used because they can provide familiar methodologies and strong guarantees around transactions.
NoSQL database (NoSQL DB)
Databases that are not structured relationally. These databases are becoming more popular in games because they have a lot of flexibility when the data model changes. NoSQL databases include MongoDB and Cassandra.
Primary key
Usually the column that contains the unique ID for inventory items, player accounts, and purchase transactions.
Instance
A single database. For example, a cluster runs multiple copies of the database software, but appears as a single instance to the game backend.
Node
For the purposes of this document, a single machine running a copy of the database software.
Replica
A second copy of a database. Replicas are frequently used for data recovery and high availability, or to help increase read throughput.
Cluster
Multiple copies of the software running on many machines that together appear as a single instance to the game backend. Clustering is used for scalability and availability.
Shard
An instance of a database. Many game studios run multiple homogeneous database instances, each of which holds a subset of the game data. Each of these instances is referred to as a shard. Sharding is typically done for performance or scalability, sacrificing management efficiency while increasing app complexity.
Hotspot
When a single shard in a distributed database like Cloud Spanner contains records receiving a large portion of all the queries going to the database. This scenario is undesirable because it degrades performance.

Using Cloud Spanner for games

In most cases where you are considering an RDBMS for your game, Cloud Spanner is an appropriate choice because it can effectively replace either the game DB, the auth DB, or in many cases, both.

Game DBs

Cloud Spanner can operate as a single worldwide transactional authority, which makes it an outstanding fit for game inventory systems. Any in-game currency or item that can be traded, sold, gifted, or otherwise transferred from one player to another presents a challenge in large-scale game backends. Often, the popularity of a game can outpace a traditional database's ability to handle everything in a single-node database. Depending on the type of game, the database can struggle with the number of operations required to handle the player load as well as the amount of stored data. This often leads game developers to shard their database for additional performance, or to store ever-growing tables. This type of solution leads to operational complexity and high maintenance overhead.

To help mitigate this complexity, one common strategy is to run completely separate game regions with no way to move data between them. In this case, items and currency cannot be traded between players in different game regions, because inventories in each region are segregated into separate databases. However, this setup sacrifices the preferred player experience, in favor of developer and operational simplicity.

On the other hand, you can allow cross-region trades in a geographically sharded database, but often at a high complexity cost. This setup requires that transactions span multiple database instances, leading to complex, error-prone application-side logic. Trying to get transaction locks on multiple databases can have significant performance impacts. In addition, not being able to rely on atomic transactions can lead to player exploits such as in-game currency or item duplication, which harm the game's ecosystem and community.

Cloud Spanner can simplify your approach to inventory and currency transactions. Even when using Cloud Spanner to hold all of your game data worldwide, it offers read-write transactions with even stronger than traditional atomicity, consistency, isolation, and durability (ACID) properties. With the scalability of Cloud Spanner, it means that data doesn't need to be sharded into separate database instances when more performance or storage is needed; instead, you simply add more nodes. Additionally, the high availability and data resiliency for which games often cluster their databases are handled transparently by Cloud Spanner, requiring no additional setup or management.

Auth DBs

Auth DBs can also be well served by Cloud Spanner, especially if you want to standardize on a single RDBMS at your studio or publisher level. Although auth DBs for games often don't require the scale of Cloud Spanner, the transactional guarantees and high data availability can make it compelling. Data replication in Cloud Spanner is transparent, synchronous, and built-in. Cloud Spanner has configurations offering either 99.99% ("four nines") or 99.999% ("five nines") of availability, with "five nines" corresponding to less than five and a half minutes of unavailability in a year. This type of availability makes it a good choice for the critical authentication path required at the beginning of every player session.

Best practices

This section provides recommendations for how to use Cloud Spanner in game design. It's important to model your game data to benefit from the unique features offered by Cloud Spanner. Although you can access Cloud Spanner by using relational database semantics, some schema design points can help you increase your performance. The Cloud Spanner documentation has detailed schema design recommendations that you can review, but the following sections are some best practices for game DBs.

The practices in this document are based on experiences from customer usage and case studies.

Use UUIDs as player and character IDs

The player table typically has one row for each player and their in-game currency, progress, or other data that doesn't map easily to discrete inventory table rows. If your game allows players to have separate saved progress for multiple characters, like many large persistent massively multiplayer games, then this table typically contains a row for each character instead. The pattern is otherwise the same.

We recommend using a globally unique character or player identifier (character ID) as the primary key of the character table. We also recommend using the Universally Unique Identifier (UUID) v4, because it spreads the player data across DB nodes and can help you get increased performance out of Cloud Spanner.

Use interleaving for inventory tables

The inventory table often holds in-game items, such as character equipment, cards, or units. Typically, a single player has many items in their inventory. Each item is represented by a single row in the table.

Similar to other relational databases, an inventory table in Cloud Spanner has a primary key that is a globally unique identifier for the item, as illustrated in the following table.

itemID type playerID
7c14887e-8d45 1 6f1ede3b-25e2
8ca83609-bb93 40 6f1ede3b-25e2
33fedada-3400 1 5fa0aa7d-16da
e4714487-075e 23 5fa0aa7d-16da
d4fbfb92-a8bd 14 5fa0aa7d-16da
31b7067b-42ec 3 26a38c2c-123a

In the example inventory table, itemID and playerID are truncated for readability. An actual inventory table would also contain many other columns that aren't included in the example.

A typical approach in an RDBMS for tracking item ownership is to use a column as a foreign key that holds the current owner's player ID. This column is the primary key of a separate database table. In Cloud Spanner, you can use interleaving, which stores the inventory rows near the associated player table row for better performance. When using interleaved tables, keep the following in mind:

  • You need to keep the total data in the player row and all their descendant inventory rows under ~2 GiB. This restriction isn't typically an issue with an appropriate data model design.
  • You cannot generate an object without an owner. You can avoid ownerless objects in the game design provided the limitation is known ahead of time.

Design indexing to avoid hotspots

Many game developers implement indexes on many of the inventory fields to optimize certain queries. In Cloud Spanner, creating or updating a row with data in that index generates additional write load proportional to the number of indexed columns. You can improve Cloud Spanner performance by eliminating indexes that aren't used frequently, or by implementing these indexes in other ways that don't impact database performance.

In the following example, there is a table for long-term player high-score records:

CREATE TABLE Ranking (
        PlayerID STRING(36) NOT NULL,
        GameMode INT64 NOT NULL,
        Score INT64 NOT NULL
) PRIMARY KEY (PlayerID, GameMode)

This table contains the player ID (UUIDv4), a number representing a game mode, stage, or season, and the player's score.

In order to speed up queries that filter for the game mode, consider the following index:

CREATE INDEX idx_score_ranking ON Ranking (
        GameMode,
        Score DESC
)

If everyone plays the same game mode called 1, this index creates a hotspot where GameMode=1. If you want to get a ranking for this game mode, the index only scans the rows containing GameMode=1, returning the ranking quickly.

If you want to get a ranking for this game mode, the index only scans the rows containing GameMode=1, thus providing low read latency. However, if everyone plays the same game mode, this index creates a hotspot.

If you change the order of the previous index, you can solve this hotspot problem:

CREATE INDEX idx_score_ranking ON Ranking (
        Score DESC,
        GameMode
)

This index won't create a significant hotspot from players competing in the same game mode, provided their scores are distributed across the possible range. However, getting scores won't be as fast as with the previous index because the query scans all scores from all modes in order to determine if GameMode=1.

As a result, the reordered index solves the previous hotspot on game mode but still has room for improvement, as illustrated in the following design.

CREATE TABLE GameMode1Ranking (
        PlayerID STRING(36) NOT NULL,
        Score INT64 NOT NULL
) PRIMARY KEY (PlayerID)

CREATE INDEX idx_score_ranking ON Ranking (
        Score DESC
)

We recommend moving the game mode out of the table schema, and use one table per mode, if possible. By using this method, when you retrieve the scores for a mode, you only query a table with scores for that mode in it. This table can be indexed by score for fast retrieval of score ranges without significant danger of hotspots (provided the scores are well distributed). As of the writing of this document, the maximum number of tables per database in Cloud Spanner is 2048, which is more than enough for most games.

Separate databases per tenant

Unlike other workloads, where we recommend designing for multitenancy in Cloud Spanner by using different primary key values, for gaming data, we recommend the more conventional approach of separate databases per tenant. Schema changes are common with the release of new game features in live service games, and isolation of tenants at a database level can simplify schema updates. This strategy can also optimize the time it takes to back up or restore a tenant's data, because these operations are performed on an entire database at once.

Avoid incremental schema updates

Unlike some conventional relational databases, Cloud Spanner remains operational during schema updates. All queries against the old schema are returned (although they might return less quickly than usual), and queries against the new schema are returned as they become available. You can design your update process to keep your game running during schema updates when running on Cloud Spanner, provided you keep the preceding constraints in mind.

However, if you request another schema change while one is currently being processed, the new update is queued and won't take place until all previous schema updates have completed. You can avoid this situation by planning larger schema updates, instead of issuing many incremental schema updates in a short period. For more information about schema updates, including how to perform a schema update that requires data validation, see Cloud Schema updates.

Consider database access and size

When you develop your game server and platform services to use Cloud Spanner, consider how your game accesses the database and how to size the database to avoid unnecessary costs.

Use native drivers and libraries

When you develop against Cloud Spanner, consider how your code interfaces with the database. Cloud Spanner offers native client libraries for many popular languages, which are typically feature-rich and performant. JDBC drivers are also available, which support data manipulation language (DML) and data definition language (DDL) statements. In cases where Cloud Spanner is used in new development, we recommend using the Cloud Client Libraries for Cloud Spanner. Although typical game engine integrations don't have much flexibility in language selection, for platform services accessing Cloud Spanner, there are cases of gaming customers using Java or Go. For high throughput applications, select a library where you can use the same Cloud Spanner client for multiple sequential requests.

Size the database to testing and production needs

During development, a single-node Cloud Spanner instance is likely sufficient for most activities, including functional testing.

Evaluate Cloud Spanner needs for production

When you move from development to testing, and then into production, it's important that you reevaluate your Cloud Spanner needs to insure your game can handle live player traffic.

Before you move to production, load tests are crucial to verify that your backend can handle the load during production. We recommend running load tests with double the load you expect in production in order to be prepared for spikes in usage and cases where your game is more popular than anticipated.

Run load tests using real data

Running a load test with synthetic data isn't sufficient. You should also run load tests using data and access patterns as close as possible to what is expected in production. Synthetic data might not detect potential hotspots in your Cloud Spanner schema design. Nothing is better than running a beta test (open or closed) with real players to verify how Cloud Spanner behaves with real data.

The following diagram is an example from a game studio that illustrates the importance of using beta tests to load test.

List of players names and their rank for load testing.

The studio prepared some representative data based on trends they had from a previous game they had operated for a couple of years. This table contains sample player's names and their rank.

Based on this information, the data model for Cloud Spanner indexes players by rank. In load tests, this model does an acceptable job of distributing the data across multiple shards, as illustrated in the following diagram.

Players distributed across Cloud Spanner servers by their rank.

The synthetic data used in the load test is similar to the eventual steady state of the game (where player ranks are well distributed). Players are distributed to various Cloud Spanner servers based on their rank.

However, their game design dictates that all players started at rank 1. The following diagram shows all players at launch on the same Cloud Spanner server while the other Cloud Spanner servers are unused.

Players at launch with the same rank creating a hotpot in a single Cloud Spanner instance.

This creates a hotspot during the launch window because of the massive influx of new players all going to the same shard instead of being distributed across the shards, impacting the game's availability.

After the launch, the hotspot issue is resolved in the following diagram because the player's are evenly distributed across the shards.

Adding a shard column to the schema evenly distributes players at launch by shardid instead of rank.

By adding a ShardID column to the schema, and indexing this new column, players are evenly distributed across the available Cloud Spanner servers.

Because no beta test was run, the studio didn't realize they were testing by using data with incorrect assumptions. Although synthetic load tests are a good way to validate how many queries per second (QPS) your instance can handle, a beta test with real players is necessary to validate your schema and prepare a successful launch.

Size the production environment to anticipate peak demand

Major games often experience the peak of their traffic at launch. Building up a scalable backend applies not only to platform services and dedicated game servers, but to databases as well. Using GCP solutions such as App Engine, you can build frontend API services that can scale up quickly. Even though Cloud Spanner offers the flexibility to add and remove nodes online, it isn't an autoscaling database. You need to provision enough nodes to handle the traffic spike at launch.

Based on the data you gathered during load testing or from any public beta testing, you can estimate the number of nodes required to handle requests at launch. It's a good practice to add a few nodes as buffer in case you get more players than expected. You should always size the database based on not exceeding an average CPU usage of 65%.

Pre-warm the database before launch

One other important thing to prepare before launch is warming the database up.

Cloud Spanner in a cold state can't provide you the nodes you need for launch without warming up first.

Cloud Spanner is a distributed database, which means that as your database grows, Cloud Spanner divides your data into chunks called splits. Individual splits can move independently from each other and get assigned to different servers, which can be in different physical locations. For more information, see Database splits.

A split is defined as a range of rows. In other words, it contains a subset of your table. Cloud Spanner splits data based on load and size. That way, splits can be dynamically moved across Cloud Spanner nodes to balance the overall load on the database. The more data you insert into Cloud Spanner, the more splits are generated.

In the following diagram, there are four nodes.

Data in one node when Cloud Spanner is cold.

Because you have no data in Cloud Spanner, when you start writing data you only write to a single node. Cloud Spanner is currently in a cold state.

The following diagram illustrates the split to the other nodes.

Data split between nodes when Cloud Spanner is warm.

As the data comes into the system, Cloud Spanner starts to split that data to rebalance the load across the four provisioned nodes. Now Cloud Spanner is in a warm state.

You want to launch your game when Cloud Spanner is in a warm state with splits already balanced across all of the nodes. In order to warm your database up, follow these steps:

  1. Make sure that the table primary keys you generate for your load test are in the same keyspace (have the same statistical properties) as the keys you are using for real production traffic
  2. Run a load test two days before your launch. Run the load test for at least one hour at the expected peak load. The load test causes Cloud Spanner to create more splits due to load-based splitting.
  3. After the load test is complete, you can delete the rows generated by your load test from your tables, but don't delete the tables themselves. This keeps the splits available for your launch window.

Monitor and understand performance

Any production database requires comprehensive monitoring and performance metrics. Cloud Spanner comes with built-in metrics in Stackdriver. Where possible, we recommend incorporating the provided gRPC libraries into your game backend process because these libraries include OpenCensus tracing. OpenCensus tracing lets you see query traces in Stackdriver as well as other supported open source tracing tools.

In Stackdriver, you can see details on your Cloud Spanner usage, including data storage and CPU usage. For most cases, we recommend that you base your Cloud Spanner scaling decisions on this CPU usage metric or observed latency. For more information about suggested CPU usage for optimized performance, see Best practices for instances.

Cloud Spanner offers query execution plans. You can review these plans in the GCP Console, and contact support if you need help understanding your query performance.

When you're evaluating performance, keep short cycle testing to a minimum because Cloud Spanner transparently splits your data behind the scenes to optimize performance based on your data access patterns. You should evaluate performance by using sustained, realistic query loads.

When removing data, delete rows instead of re-creating tables

When you're working with Cloud Spanner, newly created tables haven't yet had an opportunity to undergo load-based or size-based splitting to improve performance. When you delete data by dropping a table and then recreating it, Cloud Spanner needs data, queries, and time to determine the correct splits for your table. If you are planning to repopulate a table with the same kind of data (for example, when running consecutive performance tests), you can instead run a DELETE query on the rows containing data you no longer need. For the same reason, schema updates should use the provided Cloud Spanner API, and should avoid a manual strategy, such as creating a new table and copying the data from another table or a backup file.

Select a data locality to meet compliance requirements

Many games must comply with data locality laws such as GDPR when played worldwide. To help support your GDPR needs, see the GCP and the GDPR whitepaper and select the correct Cloud Spanner regional configuration.

What's next

Was this page helpful? Let us know how we did:

Send feedback about...