Instances, clusters, and nodes

To use Cloud Bigtable, you create instances, which contain up to 4 clusters that your applications can connect to. Each cluster contains nodes, the compute units that manage your data and perform maintenance tasks.

This page provides more information about Cloud Bigtable instances, clusters, and nodes.

Before you read this page, you should be familiar with the overview of Cloud Bigtable.

Instances

A Cloud Bigtable instance is a container for your data. Instances have one or more clusters, located in different zones. Each cluster has at least 1 node.

A table belongs to an instance, not to a cluster or node. If you have an instance with more than one cluster, you are using replication. This means you can't assign a table to an individual cluster or create unique garbage-collection policies for each cluster in an instance. You also can't make each cluster store a different set of data in the same table.

An instance has a few important properties that you need to know about:

  • The storage type (SSD or HDD)
  • The application profiles, which are primarily for instances that use replication

The following sections describe these properties.

Storage types

When you create an instance, you must choose whether the instance's clusters will store data on solid-state drives (SSD) or hard disk drives (HDD). SSD is often, but not always, the most efficient and cost-effective choice.

The choice between SSD and HDD is permanent, and every cluster in your instance must use the same type of storage, so make sure you pick the right storage type for your use case. See Choosing between SSD and HDD storage for more information to help you decide.

Application profiles

After you create an instance, Cloud Bigtable uses the instance to store application profiles, or app profiles. For instances that use replication, app profiles control how your applications connect to the instance's clusters.

If your instance doesn't use replication, you can still use app profiles to provide separate identifiers for each of your applications, or each function within an application. You can then view separate charts for each app profile in the Cloud Console.

To learn more about app profiles, see application profiles. To learn how to set up your instance's app profiles, see Configuring app profiles.

Clusters

A cluster represents the Cloud Bigtable service in a specific location. Each cluster belongs to a single Cloud Bigtable instance, and an instance can have up to 4 clusters. When your application sends requests to a Cloud Bigtable instance, those requests are handled by one of the clusters in the instance.

Each cluster is located in a single zone. An instance's clusters must each be in unique zones. You can create an additional cluster in any zone where Cloud Bigtable is available. For example, if the first cluster is in us-east1-b, you can choose a different zone in the same region, such as us-east1-c, or a zone in a separate region, such as europe-west2-a. For a list of zones and regions where Cloud Bigtable is available, see Cloud Bigtable Locations.

Cloud Bigtable instances with only 1 cluster do not use replication. If you add a second cluster to an instance, Cloud Bigtable automatically starts replicating your data by keeping separate copies of the data in each of the clusters' zones and synchronizing updates between the copies. You can choose which cluster your applications connect to, which makes it possible to isolate different types of traffic from one another. You can also let Cloud Bigtable balance traffic between clusters. If a cluster becomes unavailable, you can fail over from one cluster to another. To learn more about how replication works, see Overview of Replication.

Nodes

Each cluster in an instance has 1 or more nodes, which are compute resources that Cloud Bigtable uses to manage your data.

Behind the scenes, Cloud Bigtable splits all of the data in a table into separate tablets. Tablets are stored on disk, separate from the nodes but in the same zone as the nodes. A tablet is associated with a single node.

Each node is responsible for:

  • Keeping track of specific tablets on disk.
  • Handling incoming reads and writes for its tablets.
  • Performing maintenance tasks on its tablets, such as periodic compactions.

A cluster must have enough nodes to support its current workload and the amount of data it stores. Otherwise, the cluster might not be able to handle incoming requests, and latency could go up. Monitor your clusters' CPU and disk usage, and add nodes to an instance when its metrics exceed the recommendations and limits listed below.

For more details about how Cloud Bigtable stores and manages data, see Cloud Bigtable architecture.

CPU usage

Cloud Bigtable reports the following metrics for CPU usage:

Metric Description
Average CPU utilization

The average CPU utilization across all nodes in the cluster.

The recommended maximum values provide headroom for brief spikes in usage.

If a cluster exceeds the recommended maximum value for your configuration for more than a few minutes, add nodes to the cluster.

CPU utilization of hottest node

CPU utilization for the busiest node in the cluster.

If the hottest node is frequently above the recommended value, even when your average CPU utilization is reasonable, you might be accessing a small part of your data much more frequently than the rest of your data.

  • Use the Key Visualizer tool to identify hotspots in your table that might be causing spikes in CPU utilization.
  • Check your schema design to make sure it supports an even distribution of reads and writes across each table.

The values for these metrics should not exceed the following:

Configuration Recommended maximum values
Single cluster

70% average CPU utilization
90% CPU utilization of the hottest node

Any number of clusters with single-cluster routing

70% average CPU utilization
90% CPU utilization of hottest node

2 clusters with multi-cluster routing

35% average CPU utilization
45% CPU utilization of hottest node

3 or more clusters with multi-cluster routing

Depends on your configuration. See the examples of replication settings for common use cases.

Disk usage

Cloud Bigtable reports the following metrics for disk usage:

Metric Description
Storage utilization (bytes)

The amount of data stored in the cluster.

This value affects your costs. Also, as described below, you might need to add nodes to each cluster as the amount of data increases.

Storage utilization (% max)

The percentage of the cluster's storage capacity that is being used. The capacity is based on the number of nodes in your cluster.

In general, do not use more than 70% of the hard limit on total storage, so you have room to add more data. If you do not plan to add significant amounts of data to your instance, you can use up to 100% of the hard limit.

If you are using more than the recommended percentage of the storage limit, add nodes to the cluster. You can also delete existing data, but deleted data takes up more space, not less, until a compaction occurs.

For details about how this value is calculated, see Storage utilization per node.

Disk load

The percentage your cluster is using of the maximum possible bandwidth for HDD reads and writes. Available only for HDD clusters.

If this value is frequently at 100%, you might experience increased latency. Add nodes to the cluster to reduce the disk load percentage.

Nodes for replicated clusters

In an instance that uses replication, make sure each cluster has enough nodes to support your use case:

  • If you use replication to provide high availability, or if you use multi-cluster routing in any of your app profiles, each cluster should have the same number of nodes. Also, as shown above under CPU usage, the recommended CPU utilization is reduced by half.

    This configuration helps ensure that if an automatic failover is necessary, the responsive cluster has enough capacity to handle all of your traffic.

  • If all of your app profiles use single-cluster routing, each cluster can have a different number of nodes. Resize each cluster as needed based on the cluster's workload.

    Because Cloud Bigtable stores a separate copy of your data with each cluster, each cluster must always have enough nodes to support your disk usage and to replicate writes between clusters.

    You can still fail over manually from one cluster to another if necessary. However, if one cluster has many more nodes than another, and you need to fail over to the cluster with fewer nodes, you might need to add nodes first. There is no guarantee that additional nodes will be available when you need to fail over—the only way to reserve nodes in advance is to add them to your cluster.

What's next