Designing multi-tenant architectures

This document helps you plan, design, and implement a multi-tenant architecture on Google Cloud by helping you choose between the following approaches:

  1. Same runtime environment for multiple tenants.
  2. One runtime environment for each tenant domain.
  3. One resource container for each tenant domain.

These approaches are listed in increasing order of isolation between tenants, but they're not necessarily mutually exclusive. You can implement a hybrid approach where you choose different options for different workload classes. For example, you can provide a shared runtime environment for multiple tenant domains to a set of tenants, and provide a runtime environment for each tenant domain to another set of tenants.

This document is useful if you need to provision and configure a multi-tenant architecture on Google Cloud, and you want a single administrative entity to be in charge of the multi-tenant architecture. For example, you need to isolate different business units, teams, or users of your infrastructure. Or, you need to deploy existing applications without completely rewriting them. In this document, multi-tenant architecture refers to multi-tenancy at the infrastructure level and not at the application level. For example, you have an application that you want to deploy multiple instances of, where there is one instance per tenant.

This document is also useful if you're evaluating the opportunity to design a multi-tenant architecture that a single administrative entity manages, and you want to explore what it might look like. Choosing one option over the others depends on several factors, and no option is inherently better than the others. Each option has its own strengths and weaknesses. To choose an option, do the following:

  1. Establish a set of criteria to evaluate the different options you have to plan, design, and implement a multi-tenant architecture.
  2. Assess each option against the evaluation criteria.
  3. Choose the option that best suits your needs.

Terminology

The following terms are important for understanding how to design, plan, and implement a multi-tenant architecture on Google Cloud:

  • Runtime environment is an environment where you deploy your workloads.
  • Resource containers are folder resources or project resources.
  • Tenant is an entity that is responsible for operating sets of related workloads in one or multiple runtime environments.
  • Tenant domain is an environment where the resources and workloads belonging to a tenant are clearly distinguishable from resources belonging to other tenants.
  • Administrative isolation guarantees that resources belonging to a tenant domain are logically separated from resources belonging to other tenant domains.
  • Resource consumption isolation guarantees a fair allocation of resources to each tenant domain, and prevents an excessive resource consumption by tenants. Resource consumption isolation implies administrative isolation.
  • Runtime environment isolation guarantees the isolation of the runtime environments in each tenant domain from the runtime environments in other tenant domains. Any modification to a runtime environment in a tenant domain can't impact the runtime environments in other tenant domains. Runtime environment isolation implies resource consumption isolation.

Establishing the criteria to evaluate multi-tenant architectures

To establish the criteria to evaluate the options for a multi-tenant architecture, you consider the most important features that you need in these architectures. To gather information about which features you need the most, you assess your workloads. For more information about assessing your workloads, see Migration to Google Cloud: Assessing and discovering your workloads. If you need to assess a containerized environment, read Migrating containers to Google Cloud: Getting started.

The following evaluation criteria and the order in which they're listed is an example. We recommend that you assess your workloads to compile a list of the criteria that are important for you and your workloads, and order them according to importance. For example, after assessing your workloads, you might consider the following evaluation criteria:

  1. Type of isolation that your workloads require. Which isolation guarantees do you need to support? What types of isolation do your workloads need? Choose between administrative isolation, resource consumption isolation, and runtime environment isolation.
  2. Degree of control that each tenant has on its tenant domain. How many customization options do your tenants need? How much freedom do your tenants need to manage their tenant domains? Do you need to maintain a tight control on each tenant domain?
  3. Anticipated number of tenants. Do you need to manage a high number of tenant domains? If you don't have the resources or expertise to automate as many of the maintenance operations as possible, the complexity of your architecture can increase with the number of tenant domains.
  4. Cost attribution. Do you need to bill the consumption of resources to your tenants, or are costs attributed to the entity that's responsible for the multi-tenant architecture? If you need to accurately account for the resource consumption of each tenant, then sharing parts of the environment between tenants, such as control planes, might not be possible.
  5. Federated identity. Do your tenants have their own identity providers that they need to federate with, or are you providing a centralized identity provider?
  6. Content or resource sharing between your tenants. Is content or resource sharing between tenants required? What type of content and resources do your tenants need to share? What type of support do your tenants need from the environment and the infrastructure to share content and resources?
  7. Limits of the supporting infrastructure. Does the infrastructure that supports your multi-tenant architecture enforce limits on the resources that you can consume? For example, Google Cloud enforces quotas on resource usage. While you can apply to raise some of these quotas, others are fixed and can't be raised.

Assessing the multi-tenant architecture options

On Google Cloud, you have different options for implementing a multi-tenant architecture. To choose the best option for your workloads, you first assess them against the evaluation criteria that you established. For each option, you assign it a score against each evaluation criterion from an arbitrary, ordered scale. For example, you can assign each option a score on a scale from 1 to 10 against each evaluation criterion.

Same runtime environment for multiple tenants

When you use the same runtime environment for multiple tenant domains, you isolate tenant domains by using features provided by the runtime environment that hosts them. For example, you can provision a Google Kubernetes Engine (GKE) cluster and a Namespace for each tenant domain in that cluster. Or, you can provision a Compute Engine instance and use the operating system to isolate tenant domains.

Use the following list to evaluate this option against the criteria that you established earlier:

  1. Type of isolation that your workloads require. This option is viable if your workloads require administrative isolation only or resource consumption isolation. For example, Kubernetes supports Namespaces for administrative isolation and resource quotas for resource consumption isolation. If you need runtime environment isolation, we recommend choosing one of the other options.
  2. Degree of control that each tenant has on its tenant domain. This option is associated with the enforcement of tight control on tenant domains. For example, you want tenant domains to follow recommended configurations and status.
  3. Anticipated number of tenants. Managing a high number of tenant domains in the same runtime environment might risk impacting multiple tenants for each change in the configuration of the runtime environment. For example, if you have any issues with the control plane of the runtime environment, these issues can impact multiple tenant domains in that runtime environment. However, using the same runtime environment can reduce the complexity of automating the provisioning and configuration of tenant domains. For example, if you're using a GKE cluster as a runtime environment for multiple tenant domains, provisioning a new tenant domain might only require a new Namespace and enforcing resource quotas.
  4. Cost attribution. Attributing costs to tenants can be a challenging task when multiple tenants share the same runtime environment. Part of the environment, such as the control plane, is shared between tenants so that it's difficult to correctly divide costs between tenants. Also, you need a comprehensive monitoring system to gather metrics about resource consumption to correctly bill your tenants. If you need strict cost attributions and billing for each tenant domain, we recommend choosing one of the other options.
  5. Federated identity. If your tenants have their own identity providers to federate with, you need to assess if the runtime environment is suitable for this kind of integration. You also need to assess if you want to allow your tenants to use and manage their own identity management system. If you want to maintain tight control on the structure of tenant domains, you can use a centralized identity management system. Provisioning and configuring an identity management system in a shared runtime environment is a non-trivial challenge. If your tenants require this kind of integration, we recommend choosing one of the other options.
  6. Content or resource sharing between your tenants. If your tenants need to share resources or contents between different tenant domains, this option can simplify the implementation of sharing mechanisms, because each tenant domain is in the same runtime environment. You don't have to create a secure communication channel between different tenant domains, because you can use the features of the runtime environment to allow communication between different tenant domains. We recommend choosing this option if you need to support content-sharing use cases.
  7. Limits of the supporting infrastructure. This option is suitable if your tenants have homogeneous requirements for resources because it limits what is available in the runtime environment. For example, a tenant can only ask for a resource that the runtime environment offers.

One runtime environment for each tenant domain

When you use one runtime environment for each tenant domain, you isolate the domains by using features provided by the platform that hosts the runtime environments. For example, you can provision a GKE cluster for each tenant domain, or you can provision a Compute Engine instance or instance group for each tenant domain.

Use the following list to evaluate this option against the criteria that you established earlier:

  1. Type of isolation that your workloads require. This option is suitable if your workloads require runtime environment isolation. For example, by using a GKE cluster for each tenant domain, you isolate the runtime environments and their control planes from each other.
  2. Degree of control that each tenant has on its tenant domain. This option allows each tenant to customize their runtime environment. Runtime environments can be self-managed, so that each tenant manages its runtime environments, or you can have a dedicated team that manages all runtime environments. The choice depends on the degree of customization that you intend to offer to your tenants. For example, if you're using GKE, you can let your tenants self-manage their clusters. This option lets you support different runtime environment types. For example, if you're using GKE, you can provision different kinds of node pools: one that offers GPU resources, and another that offers Cloud TPU resources.
  3. Anticipated number of tenants. If you plan for a high number of tenant domains, provisioning and configuring a runtime environment for each tenant domain might prove to be a non-trivial task. Also, this option might not be as efficient as having a single runtime environment because your tenants might not fully use the resources of their runtime environments. If you're aiming for efficiency and resource utilization, we recommend choosing one of the other options. For example, if you're using GKE, the workloads deployed in a cluster might not fully consume the cluster's resources
  4. Cost attribution. Cost attribution is easier when tenants have their own runtime environment. You can use the monitoring and billing tools provided by the infrastructure that supports your runtime environments to accurately measure resource consumption and attribute costs to each tenant. For example, on Google Cloud you can set up billing reports to analyze costs for each runtime environment.
  5. Federated identity. Assess if the runtime environment is suitable for this kind of integration, and if you want to allow your tenants to use and manage their own identity management system. If you want to maintain tight control on the structure of your tenant domains, you might want to use a centralized identity management system. If your tenants require this kind of integration, having a dedicated runtime environment for each tenant domain helps avoid any impact on other tenant domains.
  6. Content or resource sharing between your tenants. Sharing content between different runtime environments requires additional infrastructure, provisioning, and configuration. For example, you can set up your authentication and authorization systems to share content between tenant domains across different runtime environments.
  7. Limits of the supporting infrastructure. The infrastructure that supports your multi-tenant architecture might limit the number of runtime environments that you can provision. These limits greatly impact the number of tenant domains that you can support, because of the one-to-one mapping between tenant domains and runtime environments. For example, GKE imposes limits on the number of clusters you can create in a given zone or region.

One resource container for each tenant domain

When you use one resource container for each tenant domain, you isolate tenant domains by using features provided by the platform that hosts the runtime environments. For example, you can provision a Google Cloud project or folder for each tenant domain. While using folders can impact the isolation properties of tenant domains, this setup eases the management of the infrastructure supporting the multi-tenant architecture. You can encourage reuse, deduplication, and logical separation of the components of the architecture.

Use the following list to evaluate this option against the criteria that you established earlier:

  • Type of isolation that your workloads require. This option is suitable if your workloads require runtime environment isolation. For example, you create a Cloud project for each tenant domain to completely isolate tenant domains from each other.
  • Degree of control that each tenant has on its tenant domain. This option allows each tenant to customize their runtime environment and resource hierarchy. Also, you can support heterogeneous runtime environments, even combining multiple runtime environments in a single tenant domain. This option is suitable if your tenants require multiple runtime environments for each tenant domain. For example, if you're using GKE, you can provision multiple clusters for each tenant domain to meet the requirements of the workloads in that tenant domain.
  • Anticipated number of tenants. Provisioning and configuring multiple runtime environments for each tenant domain can be a non-trivial task, if you plan for a high number of tenant domains. This option might not be as efficient as having a single runtime environment shared between tenants, or even a runtime environment for each tenant. Your tenants might not fully use the resources of their runtime environments. If you're aiming for efficiency and resource utilization, we recommend choosing one of the other options. For example, if you're using GKE, the workloads deployed in a cluster can't fully consume the resources in those clusters.
  • Cost attribution. Cost attribution is easier when tenants have their own resource containers. You can use the monitoring and billing tools provided by the infrastructure that supports your runtime environments to accurately measure resource consumption and attribute costs to each tenant. For example, on Google Cloud you can set up billing reports to analyze costs for each runtime environment.
  • Federated identity. You can accommodate complex identity federation requirements, because tenants have the freedom to provision and configure their own solution in their tenant domain. Provisioning and configuring their own solution doesn't hinder the possibility of integrating with a centralized solution.
  • Content or resource sharing between your tenants. Sharing content between different resource containers requires additional infrastructure, provisioning, and configuration. For example, you might need to set up a space that is accessible from different tenant domains to place the shareable content.
  • Limits of the supporting infrastructure. The infrastructure that supports your multi-tenant architecture can impose limits to the number of resource containers that you can provision. These limits greatly impact the number of tenant domains that you can support because of the one-to-one mapping between tenant domains and resource containers. For example, GKE imposes limits on the number of clusters you can create in a given zone or region.

Choosing the correct option for your target environment

In the previous sections, you assigned a value to every criteria for each option. To calculate the total score of each option, you add all the ratings for that option based on the criteria. For example, if an option scored 10 against the type of isolation criterion, and 6 against the degree of control you want to allow to each tenant on its tenant domain criterion, the total score of that environment is 16.

You can also assign different weights to the score against each criterion so that you can represent the importance that each criterion has for your evaluation. For example, if type of isolation is more important than degree of control you want to allow to each tenant on its tenant domain in your evaluation, you might define multipliers to reflect that: a 1.0 multiplier for type of isolation and a 0.7 multiplier for degree of control you want to allow to each tenant on its tenant domain in your evaluation. You then use these multipliers to calculate the total score of an option.

After calculating the total score of each option that you evaluated, you organize the option by their total score, in descending order. Then, pick the option with the highest score as your option of choice.

There are multiple ways to represent this data—for example, you can visualize the results with a chart suitable to represent multivariate data, such as a radar chart.

What's next