Building scalable applications with Firestore

This document describes when to use Firestore to build large applications. This document provides solutions for infrastructure administrators who manage database systems for large applications. Using Firestore together with other products in Google Cloud simplifies provisioning and maintaining a database, so that you can focus on developing your app instead of capacity planning.

Firestore features and limitations

Firestore is designed for mobile and web applications and for storing hierarchical, transactional data that has a flexible, non-relational schema. When you evaluate Firestore as a potential database solution, make sure that its quotas and limits are appropriate for your use cases. Firestore is versatile and applicable in many instances, but other Google Cloud database products might be better for certain scenarios. When deciding whether to use Firestore or a different solution, consider the following factors.

Storage

Firestore can host any amount of data storage. It handles data amounts from kilobytes to petabytes in the same way, without affecting performance.

Real-time updates

Firestore provides real-time updates by letting clients listen to a document and use queries to get real-time updates. You provide a callback that immediately creates a document snapshot with the current contents of a single document. Each time the document contents change, another call updates the document snapshot.

Offline data persistence

Firestore provides offline data persistence with full offline support for mobile and web clients. You can access and update your data while you're offline, and then automatically sync changes to the cloud when you're back online. Built-in offline support uses a local cache to serve and store data, so your app remains responsive regardless of network latency or internet connectivity.

Transactions

Firestore supports multi-document, ACID-compliant transactions. The term ACID is short for atomicity, consistency, isolation, and durability.

Queries

Firestore offers strongly consistent queries across the entire database. Along with primary indexes, Firestore supports secondary and composite indexes to quickly look up locations of items that you request in a query.

Firestore queries are limited in the following ways:

  • Firestore is a non-relational database, so it doesn't support relational schemas or queries that use SQL semantics. In particular, Firestore doesn't support join operations, inequality filtering on multiple properties, or filtering on data that is based on results of a subquery.
    • If your app requires SQL support for non-horizontal scales, use Cloud SQL.
    • If your app requires SQL support for larger horizontal and global scales, use Cloud Spanner.
  • Firestore is optimized for online transaction processing (OLTP).

Security

Firestore encrypts all data automatically before it writes the data to disk. Firestore offers robust access management and authentication through the following methods, depending on the client libraries you use:

Autoscaling

Firestore scales up automatically, with no downtime. This scaling mechanism lets Firestore serve thousands of requests per second and millions of concurrent connections. You pay only for your actual usage based on storage size and the number of operations. For more information, see Firestore pricing.

Firestore autoscaling is limited in the following ways:

  • Firestore Native Mode can scale document update operations to a maximum of 10,000 writes per second and more than one million connections. If your app exceeds these write rate limits, we recommend that you use Datastore Mode, which scales up to millions of writes per second. To learn more about the differences between these modes, see Choosing between Native Mode and Datastore Mode.
  • Firestore can handle operations on a massive scale. However, to support complex features like replication and transactions, Firestore makes some tradeoffs that might slow performance for apps that are expected to support extreme loads.
    • If your app is extremely write-heavy, consider using Cloud Bigtable for greater data ingestion capabilities at the expense of transactions and secondary indexes.
    • If your app often displays the same information to users, such as a player leaderboard in gaming, consider client-side caching to reduce load by avoiding unnecessary requests to the server.

Latency

Firestore prioritizes durability and availability over latency by doing cross-region or cross-zone synchronous writes. If your app demands consistent sub–10 millisecond latency when reading or writing data, consider using an in-memory database like Memorystore for Redis.

Redundancy and availability

Firestore offers the following levels of multi-location redundancy that are based on different replication mechanisms:

  • Regional replication is best if your priority is to achieve low write latency. When you use regional replication, data is replicated within the same region. To ensure the lowest latency, you might want to colocate your app in the same region.
  • Multi-region replication is best if your priority is to ensure availability. When you use multi-region replication, data is replicated in multiple zones across at least two different regions, so the database is resilient to regional outages. Multi-regional replication provides increased availability and redundancy, but it has higher write latency. A witness node is deployed in a third region to act as a tiebreaker between the two replicated regions, as shown by figure 1.

Replication scheme of a multi-region database.

Figure 1. Diagram of a multi-region database in Firestore.

Client libraries

Mobile and web clients can directly access Firestore by using the Android, iOS, or web client libraries. Firestore also seamlessly integrates with the Firebase platform, which provides features like crash reporting, user authentication, message delivery, and user event analytics.

When to use Firestore

Firestore features make it suitable for a wide range of use cases, including the following:

  • User profiles: Manage user profiles to customize the user experience based on the users' past activities and preferences. You can use Firestore's flexible schema to evolve the structure of user profiles. For example, you can add new properties to support new features in your app. Schema changes happen with no downtime, and performance doesn't degrade even as the number of users grows.
  • Real-time inventories: Use Firestore's rich, nested objects to store large amounts of non-homogeneous, sparse data for diverse products without over specializing the structure. For example, you can create a product catalog for a retailer.
  • User session management: Firestore support for ACID transactions helps ensure that users can lock down one or more documents until their transaction is complete. For example, you can create shopping carts for retail transactions, or a multipart processing form for booking events.
  • State mutations: Use Firestore's ACID transactions to propagate mutations across large numbers of concurrent users. For example, you can maintain a consistent state for all players in a gaming app.
  • Persistent write-through cache: Firestore's high availability and durability provides persistent state and prevents potential data loss caused by an app crash. Firestore offers features like a simple-to-use key-value store. However, Firestore doesn't have a built-in time-to-live (TTL) or cache expiration mechanism.
  • Cross-device data synchronization: Firestore's real-time updates ensure that all connected devices always display the latest state. For example, Firestore provides a consistent state for collaborative multi-user mobile apps and apps where you connect from multiple devices.
  • IoT management and asset tracking: Firestore's offline data persistence lets you record data points even when devices lose network connectivity. For example, you can set up real-time GPS tracking of mobile devices and vehicles.
  • Real-time capabilities: Firestore's real-time updates let you set up real-time analytics and messaging. You can keep visual graphs and charts up to date, like interactive visual dashboards, and set up live discussion forums and chat rooms.
  • Distributed counters: Set up distributed counters to display document interactions such as a count of likes on a post or favorites of a specific item.

Reference architectures

This section provides reference architectures for building large web apps that combine Firestore with other Google Cloud products, including the following:

  • Daily exports
  • Caching
  • Data processing
  • Training models for machine learning

These architectures aren't prescriptive. Instead, they highlight the breadth of possible uses for Firestore in building scalable web apps. You can reorganize and adapt the architectures to build your own web app that fulfills your requirements.

Gaming

A gaming platform supports concurrent access by tens of thousands of players. The game's frontend services use Firestore to store billions of documents with hierarchical world state data. Firestore also holds user data like user configuration, party memberships, guilds, friends lists, and presence data. This use case incorporates other Google Cloud products as follows:

  • Spanner provides a globally consistent database that can keep inventory or match history for massive player populations anywhere in the world.
  • A regional in-memory cache is deployed on Memorystore for Redis to speed access to frequently used data.
  • Events are logged to Cloud Bigtable, where developers or support staff can access them for troubleshooting.
  • Data from frontend and backend databases is regularly imported to BigQuery to run data analytics pipelines. These pipelines help discover exploits or uncover gameplay mechanics that need an update before they affect the game's community and drive players away.

Figure 2 shows the architecture of the gaming use case:

Architecture of the gaming use case.

Figure 2. Example of a gaming platform architecture.

Internet of Things

An interactive web app displays real-time telemetry information generated by Internet of Things (IoT) devices. The devices regularly measure and collect the user's temperature and heart rate, and then processes the data as follows:

  1. Each measurement is instantly submitted to IoT Core through MQTT and HTTP bridges.
  2. IoT Core publishes each measurement as an individual message to Pub/Sub.
  3. The Pub/Sub message triggers Cloud Functions that extract relevant information from the raw messages and save the results to Firestore for long-term storage.
  4. An interactive web user interface hosted on Firebase Hosting and powered by Angular listens for updates directly from Firestore. Each update is automatically pushed to the web user interface to visualize the latest information in real time.

Figure 3 shows the data pipeline for telemetry information in this scenario:

Architecture of the IoT app use case.

Figure 3. Example of an IoT app architecture.

Retail

A retail platform provides product recommendations to first-time buyers through different mediums. A web app records live data points about online users, like referrer, geographic region, and device type, and then writes the collected data to Firestore as follows:

  1. Each new record creation triggers a data pipeline in Cloud Functions that copies the user data to BigQuery.
  2. A recommendation engine, implemented with Spark MLlib and deployed on Dataproc, is trained with the live user data stored in BigQuery and with the product metadata stored in Cloud SQL.
  3. The recommendation engine provides the following predictions for recommended products:
    • Real-time predictions that are written to Firestore and automatically pushed to the online user devices.
    • Batch predictions that are sent to offline users by an email service.

Figure 4 shows the data flow for the retail platform scenario:

Architecture of the retail platform use case.

Figure 4. Example of retail platform architecture.

Real-time capture of data changes

An app receives real-time user input that changes the global state. A dashboard in Data Studio tracks real-time events to better understand user behavior and interactions. When a user action updates any state value, the following events occur:

  1. Firestore triggers a Cloud Function that writes the change to BigQuery, including the old and new state values.
  2. The Data Studio dashboard runs real-time aggregation queries on the event data in BigQuery.
  3. The queries generate metrics like ratio of event changes aggregated to different buckets, unique type of events per time bucket, and event ingestion latency.

For a detailed presentation and demo of this architecture, see the Cloud Next '19 video Building amazing apps With Firestore.

Figure 5 shows the architecture for capturing real-time data change:

Architecture of the data capture use case.

Figure 5. Example of a simple data capture architecture.

Collaborative content editing

A collaborative content management system (CMS) lets multiple editors work at the same time on the same article. Every time an editor makes a change—for example, to add or delete a character—the editor's client submits the change directly to Firestore.

If multiple editors submit changes at the same time, the following resolution process occurs:

  1. Firestore's transactions ensure that only the first received change is written to the database. Other changes are rejected.
  2. Firestore automatically sends the updated content to all editors.
  3. The editors that were initially rejected reapply their own changes on top of the updated content, and then resubmit the changes to Firestore.
  4. The same conflict resolution process repeats until all changes by all clients are accepted and written to the database.

A staging pipeline lets editors preview the content as follows:

  1. A cron job hosted on Cloud Scheduler triggers a Cloud Function every second.
  2. The function copies the latest content from Firestore to the staging database hosted on Cloud SQL.
  3. Editors preview the staged content on the staging server hosted on App Engine.

When the content is complete, an editor clicks the publish button in the CMS. This action triggers a Cloud Function that copies the latest content from Firestore to the production database hosted on Cloud SQL. Readers can then consume the newly published content on the production website. For a similar real-world example of this architecture, see the New York Times article We Built Collaborative Editing for Our Newsroom's CMS.

Figure 6 shows the pipeline for editing, staging, and publishing content in the collaborative content editing use case:

Architecture of the content editing use case.

Figure 6. Example of a collaborative content editing platform architecture.

Next steps