Best practices for mobile game online architectures on Google Cloud

This document describes best practices for running an API-driven mobile game backend on Google Cloud. This document provides a reference that game developers can use as a starting point to design an online architecture for mobile games. The best practices in this document can apply to any type of mobile game. However, this document focuses on games that store player progress and account information in a database and access that data through a custom interface API written by the game developers.

This document is for teams who develop mobile video games like Niantic's Pokemon Go, Nintendo's Super Mario Run, or King's Candy Crush Saga. The best practices in this document aren't for games of chance (card games and casino games) or fantasy sports apps (for example, fantasy football), which most commonly scale like a typical web app or social app.

The hit-driven nature of a game can drive massive surges in demand during peak hours. Because your app might be featured by an app store or embraced by the streaming community, it's important to consider success-disaster scenarios, and ensure that you have a clear path to scaling when a game becomes popular. Making informed decisions during development can help minimize risk.

Estimate your expected user load

When you design the online backend of your mobile game, it's important to have a best guess estimate of user load. If you design your architecture to use most of its resources at the expected load, it might fail if it gets attention from the larger gamer community and cannot scale to meet that demand. Failure to scale can result in lost revenue opportunity and damage to your studio reputation. It can be a challenge to design an architecture that runs well at your expected load but has a clear path to a much higher scale if you have unexpected success.

User load estimates are always based on many pieces of data, but there are two essential categories to include:

  • Number of players and frequency of play sessions: This is usually an educated guess based on the number of players playing similar games in the market and on your budget to acquire users through marketing spend.
  • API load caused by each player: It can be measured through comprehensive load testing.

Make an initial estimate

When you make an initial estimate, consider all the factors that you have available, such as the following:

  • Success of past games or similar games in the market
  • Popularity of any included intellectual property (IP)
  • Timing of the release into the market
  • Number of pre-registrations or cross-promotions in the rest of your app portfolio
  • Marketing budget

After you estimate the number of users, it's a common practice to create a best-case scenario of four times (4X) the estimate. However, we recommend that you consider a success-disaster scenario in which a game goes viral or has an otherwise unexpected success. Some studios increase their user estimate by 10 times (10X), but past game launches on Google Cloud have increased their estimate by 20 times (20X) or even by 40 times (40X) in extreme circumstances. Even if those figures are highly unlikely, it's valuable to calculate these numbers and validate that your architecture can scale to those levels.

Run a load test

Knowing the expected number of users is insufficient to understand the scaling needs of your mobile game. It is critical to run load tests with conditions as close to real world circumstances as possible. A load test should be run with closed beta testers using a near-final version of the game. Load testing lets you profile performance of the state storage database and the API layer to ensure that enough headroom is available. Real users can often create usage patterns that developers are unable to foresee. Therefore, it's important to get some live player-usage profiling to use as a model for larger-scale load tests. We recommend that you use a load testing framework to replicate the user patterns from the beta test at the scale determined by the initial estimate you calculated in the previous section.

When you run a large-scale load test, contact your Google Cloud sales team and file an appropriate Cloud Customer Care ticket for the window of time when you plan to stress test. Filing a Customer Care ticket enables the team to help you proactively request quota increases where necessary. It also helps make sure that a Customer Care engineer is available to answer your questions in case a Google Cloud product doesn't behave the way you expect it to.

Validate against the reference architecture

The following diagram provides a reference architecture for the best practices in this document:

A mobile game reference architecture.

In the preceding diagram, your game clients connect to your mobile game backend through a load balancer. The backend has a direct connection to your player record database, with an optional high-speed cache layer in front of it that stores and retrieves player progress, entitlements, and other data. The backend emits operations metrics and logs to Google Cloud's operations suite. The metrics and logs are critical for monitoring your backend performance, and are also accessible to your data warehouse. Analytics specialists can directly access the data warehouse using BigQuery, and AutoML can be used to generate models used to predict spend and churn. These predictions can then be made available to your game backend. The following components are described in detail later in this document:

  1. Compute used for client-facing APIs
  2. Database used for state storage
  3. Google Cloud's operations suite observability and monitoring
  4. Analytics

Some mobile games offer real-time multiplayer using a dedicated game server or TURN/STUN servers. The best practices in this document don't explicitly include such servers, but the practices are compatible with game servers. If you need to run game servers at scale for your mobile game backend, consider using Google Cloud's Game Servers product, which can help simplify management and administration.

Compute options

Google Cloud provides several compute options for your mobile game backend, from fully managed scalable options like App Engine, to fully customizable environments like Google Kubernetes Engine (GKE). It's important to understand your needs in detail and decide accordingly. All options in the following sections offer full integration with Cloud Load Balancing so that your HTTP(S) traffic can take advantage of seamless scaling. The options also include Google Cloud Armor features like enterprise-grade DDoS protection.

Use App Engine for proven scalable serverless

App Engine is Google Cloud's fully managed serverless platform that lets you focus on writing code without having to manage underlying infrastructure. You can configure App Engine to scale according to your game's needs. It also enables faster iteration times for your developers by building and deploying directly from source with a single command. App Engine is an ideal choice for teams that are small or have limited experience with scaling infrastructure operations. It is proven at scale through multiple mobile game launches, including launches from Nintendo, Madfinger Games, Pocket Gems, and Backflip Studios.

When you evaluate whether App Engine is right for your game, it's important to understand that instances can be started or stopped based on the query rate from players. Therefore, service designs shouldn't plan to keep state in memory between user requests. If you need to maintain state between requests, you should store and retrieve that state in a state storage database (discussed in the next section) or use a separate cache like Memorystore (Memcached or Redis).

App Engine apps might require extra time or resources to make them run efficiently in other runtime environments. If you require a single runtime target that can be deployed in multi-cloud or hybrid cloud environments, we recommend Cloud Run or Google Kubernetes Engine instead.

Use Cloud Run for new serverless apps

Cloud Run lets you develop a new app in containers for your game backend, without having to manage Kubernetes clusters. Cloud Run can automatically scale your API containers to meet the request needs of your player base. It offers many of the benefits of App Engine, including a fully managed runtime environment where the infrastructure is abstracted away and scaling is handled automatically according to the configuration you select. Because it's built on open standard Knative, it can be simpler to write portable services when you use Cloud Run. Cloud Run apps run in containers on Kubernetes, which provides a clear path to moving to self-managed Kubernetes if you need more control in the future.

Use GKE for full control over your workload

Google Kubernetes Engine is an option for developers who need more control or who work with experienced operations teams. If your teams already use Kubernetes for their app stacks, GKE lets them run their game backend alongside their existing services, using the same Kubernetes interface and command-line interface (CLI). If your teams want to run apps on multiple clouds or on-premises, GKE provides a single-target platform for apps built for the cloud (cloud-native apps). Multiple games have launched successfully to massive scale on GKE, including Pokémon GO.

State storage databases

When you select the database for your mobile game, you need to consider how to scale and manage growing player bases and increasing game complexity. Cloud Spanner and Firestore are feature rich, offer a managed experience, and have proven mobile game success stories at scale. Google Cloud also offers Cloud SQL, a managed MySQL database. However, Cloud SQL can be challenging to scale because manual database sharding or clustering can introduce significant difficulty and complexity to your state storage layer, leading to unwanted downtime and customer impact.

Use Cloud Spanner for global games with trading between users

Cloud Spanner is a fully managed relational database with unlimited scale, strong consistency, and up to 99.999% availability. It features SQL semantics and a familiar interface for developers who are used to working with relational databases. Cloud Spanner can be deployed globally but accessed regionally, so you have the simplicity of a single database instance with the performance of distributed replicas.

Cloud Spanner provides infinite scale, so it works well for player profiles and inventory storage. It also provides transaction guarantees which lets you provide a reliable player-to-player trading or auction house functionality for your game customers. Cloud Spanner provides several tools for migration, development, observability, and introspection for developer onboarding and database administration. Cloud Spanner gradually scales to millions of queries per second (QPS). For a big launch, such as one that expects more than 1,000 QPS on day 1, we recommend that you follow the best practices of warmup and benchmarking.

Cloud Spanner can scale to billion-user use cases, and provides the flexibility to manage the scale to meet the performance you need. Cloud Spanner has significant use in the mobile game space; for information about how to use it in your game, see Best practices for using Cloud Spanner as a gaming database.

Use Firestore for development velocity and low operational overhead

Firestore is a fully managed, scalable NoSQL document database. It offers a streamlined developer experience, and it doesn't require schema updates when you want to store new information. It also offers strong consistency, transactional guarantees, and up to 99.999% availability. Firestore can also be accessed directly from your mobile game that uses the Firebase client library.

A typical approach is to use a single Firestore document per player, and store all of their progress in that document in a hierarchy that works well with your game design. When you design a game to use Firestore, consider Firebase limitations and Firestore best practices. Based on these best practices, workloads that require frequent updates to the same document might not be a good fit. Each Firestore database also has a limit of 10,000 total updates per second across all documents in its standard configuration. If your use case might exceed this limit, consult with your Google Cloud technical contact and consider Datastore mode.

Datastore mode makes some trade-offs to enable more updates per second, and the Datastore best practices are slightly different from Firestore. Extremely high scale games like Pokémon GO have successfully launched using Datastore mode. The games were able to scale to meet overwhelming demand of more than 50 times (50X) the estimated player traffic.

Firestore can handle scaling for you automatically. However, to ensure smooth scaling for sudden increases in usage (for example, following a major marketing spend), you should have a capacity planning conversation in advance with your Google Cloud account manager.

Reevaluate caching as a performance optimization

To optimize performance, it's a common mobile games strategy to put an in-memory cache in front of the database. The in-memory cache holds data that is frequently read or it batches low-priority updates. This strategy can add design complexity to the architecture, and often isn't needed with a scalable, managed database like Cloud Spanner or Firestore, which can handle the read and write loads. If you load test your database access patterns and still need a cache, then consider a managed option like Memorystore for Redis or Memcached to reduce your administration overhead.

Select a data locality to meet compliance requirements

When played worldwide, many games must comply with data locality laws like GDPR. To help support your GDPR needs, see the Google Cloud and the GDPR whitepaper and select the correct Cloud Spanner or Firestore regional configuration.

Observability

We recommend that you implement observability early. Observability of your app and backend infrastructure is important for finding and fixing problems quickly, enabling faster development cycles, and reducing customer impact when something goes wrong. You can save time and money by adopting a format that works well with Google Cloud's operations suite at the beginning of development.

Use open source standards to get your app metrics into Cloud Monitoring

All of your Google Cloud resources have instrumentation already integrated into Cloud Monitoring and visible in the Google Cloud console. Therefore, we recommend that you also instrument your mobile game backend to integrate with Cloud Monitoring. Integrating with Cloud Monitoring lets you use a unified-interface (sometimes called a single pane of glass) monitoring dashboard for your infrastructure and your app. Using a unified interface lets you view key metrics for your interface and your app side by side, and helps you to find and isolate issues quickly.

When you implement custom metrics and distributed tracing into your app, we recommend that you use OpenTelemetry, a free, open source project formerly known as OpenCensus. OpenTelemetry provides vendor-neutral support for collecting metrics and traces across many languages, and it can export them into many observability products, including Cloud Monitoring and Cloud Trace. For more information, see Custom metrics with OpenCensus.

Use structured logging

When you select a logging format, we recommend that you use structured logging, and sort any interesting features of your logs into JSON fields. This implementation lets you quickly sort, search, and filter your logs in Cloud Logging. Many programming languages have popular structured logging libraries or modules that can export to Cloud Logging. Google Cloud also offers many idiomatic Logging Client Libraries.

Create a BigQuery log sink

If you need to analyze your logs later, or keep them due to data retention laws in the region where you operate, set up a BigQuery sink for your logs in advance. Only new logs that are generated after a sink is created are written to BigQuery. If you are writing large volumes of logs to BigQuery, we recommend that you select the option to use partitioned tables.

Analytics

We recommend that you format your analytics for the future. When you decide which events and metrics your game writes to your analytics backend, consider what format is easiest for you to data-mine for insights. Although you can use extract, transform, and load (ETL) to copy the data your app writes into a format that works well for analytics queries, it can take time and money to do so. Investing in the design of your analytics output format can lead to significant cost savings and the possibility of real-time analytics insights. We recommend that you review presentations and testimonials from Square Enix, King, and LINE GAMES. These presentations can provide you with real-world insights into using Google Cloud's analytics products to improve your mobile games.

Use batch processing for existing formats

If you want to analyze metrics data that's in an output format that you don't control (for example, data from a third-party integration or service), we recommend that you start by saving the metrics data to Cloud Storage. If the data format is supported, you can query it directly from the BigQuery interface using BigQuery federated queries. If the data format isn't supported, you can use ETL to copy the data from Cloud Storage using Dataflow or other tools, and then store the resulting formatted data in BigQuery alongside data from other sources. We recommend that you set up a regular batch job to save costs instead of streaming, unless you have an urgent need for the data in real time. For more information about this approach, see Optimizing large-scale ingestion of analytics events and logs.

Predict churn and spend with proven models

You might already be using Firebase for your mobile game for one of its many other features like remote config, in-app messaging, or Firestore client libraries. Firebase also offers built-in churn and spend prediction machine learning (ML) models. You can integrate Remote Config personalization to apply ML to your analytics data, which can create dynamic user segments based on your users' predicted behavior. This data can be used to trigger other Firebase features, or exported to BigQuery for more flexibility.

Normalize data for AutoML Tables custom-model training

Generating an effective ML model typically requires extensive machine learning expertise to select relevant features and tune hyperparameters. However, following data preparation guidelines improves the ability of the latest automated tools to handle these tasks for you and generate a useful model on your behalf. After a model is generated, it can be hosted on Google Cloud to do online or batch predictions—for example, predicting if a player will make a purchase in the game, or will quit playing.

Although analytics events and player data are useful for traditional analytics queries and business intelligence metrics, a different format is needed to train an ML model. A common use case for ML in mobile games is to make a custom model to predict when players will first spend money in the game. AutoML Tables can greatly simplify the training process. For a general overview, see the AutoML Tables documentation Preparing your training data and Best practices for creating training data.

Multiple game studios and publishers have seen excellent results by using a daily-rollup format as the basis for training. A daily rollup is a normalized row format which has one field for each significant analytics event, containing a cumulative count of the number of times the player has triggered the event up until that day. This row provides a daily snapshot of all the potentially important events a player triggered so far, along with a true or false has made a purchase flag.

The process described in the AutoML Tables quickstart documentation can result in high-quality models when training with data formatted in this way. The model can then be given a daily-rollup row and provide predictions of how likely it is that the player will make a purchase. Similar approaches to formatting data can also be used alongside different flags to train models to make different predictions, including churn or other player behaviors. Making an up-front investment in building normalized data formats can help you rapidly try out models to predict any player action you can imagine. This modeling can potentially help you monetize your game or prioritize features that result in desirable player outcomes.

Performing analytics on your Cloud Spanner game database

Cloud Spanner also lets administrators and analytics specialists access data without affecting the game's database traffic. BigQuery-Cloud Spanner federation lets BigQuery query data that resides in Cloud Spanner in real-time, without copying or moving data. Cloud Spanner also supports exporting data using Dataflow templates that you can analyze in Looker or in the Google Cloud console, or that you can store in other analytics platforms of your choice.

Distribution, notifications, and other topics

Mobile game development is a large and varied field. Although every aspect cannot be covered in one guide, the following sections describe additional important considerations.

Use Cloud CDN to distribute your game assets

Cloud CDN can distribute your game assets to mobile clients, and it has built-in Cloud Monitoring and Cloud Logging integrations. If you have an existing vendor relationship, most major CDNs can use Cloud Storage as an origin server.

Reduce abusive behaviors using reCAPTCHA

Although reCAPTCHA isn't technically a part of your backend infrastructure, it can be a valuable integration into your client. It uses adaptive challenges to reduce abusive activities in your app, and for mobile games it is often used to lower the number of automated user (bot) registrations. For more information, see the reCAPTCHA documentation.

Push notification to clients with Firebase Cloud Messaging

If your mobile game needs to send push notifications or offer users the ability to message each other, consider Firebase Cloud Messaging (FCM). FCM is a cross-platform messaging service that lets you reliably send messages at no cost. It can also be used to send data messages, which lets you determine completely what happens in your app code. You can write your own messaging backend app or use serverless Cloud Functions to create the messages, and then send them using the Firebase Admin SDK or the FCM server protocols.

Simplify game configuration distribution

A common approach to game balancing in mobile games is to have most gameplay parameters defined in data. You can then securely distribute updates to clients when you need to fix parameters like a drop rate or weapon attack stat. Firebase Remote Config is designed to let you change the behavior and appearance of your app without requiring users to download an app update. It lets you define default values in your app, which you can override for all segments or specific segments of your user base by using the Firebase console, or programmatically from the Remote Config backend APIs.

Evaluate ML for game balance

Research into using ML for game balance has generated several successful case studies presented at GDC and other events. Many of these case studies come from custom solutions built by data scientists and ML engineers, and aren't easily replicable without an experienced team. If you want to evaluate ML for game balance or as an AI opponent, tools like AutoML Tables can help you experiment with custom models without extensive ML expertise. To predict player behaviors, like their selection of items or next moves, use approaches similar to those described in Normalize data for AutoML Tables model training earlier in this document.

What's next