To use Cloud Bigtable, you create instances, which contain 1 or 2 clusters that your applications can connect to. Each cluster contains nodes, the compute units that manage your data and perform maintenance tasks.
This page provides more information about Cloud Bigtable instances, clusters, and nodes.
- To learn how to create an instance, see Creating an Instance.
- To learn how to manage an instance's clusters, see Adding and deleting clusters.
- To learn how to monitor an instance and its clusters, see Monitoring an Instance.
- To learn how to update the number of nodes in a cluster, see Adding and removing nodes.
Before you read this page, you should be familiar with the overview of Cloud Bigtable.
Tables belong to instances, not to clusters or nodes. So if you have an instance with 2 clusters, you can't assign tables to individual clusters or create unique garbage-collection policies for each cluster. You also can't make each cluster store a different set of data in the same table.
An instance has a few important properties that you need to know about:
- The instance type (production or development)
- The storage type (SSD or HDD)
- The application profiles, for instances that use replication
The following sections describe these properties.
When you create an instance, you must choose what type of instance to create:
- Production: A standard instance with either 1 or 2 clusters, as well as 3 or more nodes in each cluster. You cannot downgrade a production instance to a development instance.
- Development: A low-cost instance for development and testing, with performance limited to the equivalent of a 1-node cluster. There are no monitoring or throughput guarantees; replication is not available; and the SLA does not apply. You can upgrade a development instance to a production instance at any time.
When you create an instance, you must also choose whether the instance's clusters will store data on solid-state drives (SSD) or hard disk drives (HDD). SSD is often, but not always, the most efficient and cost-effective choice.
The choice between SSD and HDD is permanent, and every cluster in your instance must use the same type of storage, so make sure you pick the right storage type for your use case. See Choosing Between SSD and HDD Storage for more information to help you decide.
After you create a production instance, Cloud Bigtable uses the instance to store application profiles, or app profiles. For instances that use replication, app profiles control how your applications connect to the instance's clusters. If your instance doesn't use replication, you can still use app profiles to provide separate identifiers for each of your applications, or each function within an application; you can then view separate charts for each app profile in the GCP Console.
A cluster represents the actual Cloud Bigtable service. Each cluster belongs to a single Cloud Bigtable instance, and an instance can have up to 2 clusters. When your application sends requests to a Cloud Bigtable instance, those requests are actually handled by one of the clusters in the instance.
Each cluster is located in a single zone.
An instance's clusters must be in unique zones that are within the same region.
For example, if the first cluster is in
us-east1-c is a
valid zone for the second cluster.
For a list of zones and regions where
Cloud Bigtable is available, see Cloud Bigtable
Cloud Bigtable instances with only 1 cluster do not use replication. If you add a second cluster to a production instance, then Cloud Bigtable automatically starts replicating your data by keeping separate copies of the data in each of the clusters' zones and synchronizing updates between the copies. You can choose which cluster your applications connect to, which makes it possible to isolate different types of traffic from one another, or you can let Cloud Bigtable balance traffic between clusters. If a cluster becomes unavailable, you can fail over from one cluster to another. To learn more about how replication works, see Overview of Replication.
Each cluster in a production instance has 3 or more nodes, which are compute resources that Cloud Bigtable uses to manage your data.
Behind the scenes, Cloud Bigtable splits all of the data from your tables into smaller tablets. Tablets are stored on disk, separate from the nodes but in the same zone as the nodes. Each node is responsible for keeping track of specific tablets on disk; handling incoming reads and writes for its tablets; and performing maintenance tasks on its tablets, such as periodic compactions. For more details about how Cloud Bigtable stores and manages data, see Cloud Bigtable architecture.
A cluster must have enough nodes to support its current workload and the amount of data it stores. Otherwise, the cluster might not be able to handle incoming requests, and latency could go up. You should monitor your clusters' CPU and disk usage, and add nodes if you exceed the recommendations and limits listed below.
Cloud Bigtable reports the following metrics for CPU usage:
The average CPU utilization across all nodes in the cluster.
In general, this value should be a maximum of 70%, or 35% if you use replication with multi-cluster routing. This maximum provides headroom for brief spikes in usage.
If a cluster exceeds this maximum value for more than a few minutes, add nodes to the cluster.
|CPU utilization of hottest node||
CPU utilization for the busiest node in the cluster.
In general, this value should be a maximum of 90%, or 45% if you use replication with multi-cluster routing.
If the hottest node is frequently above the recommended value, even when your average CPU utilization is reasonable, you might be accessing a small part of your data much more frequently than the rest of your data. Use the Key Visualizer tool to identify hotspots in your table that might be causing spikes in CPU utilization, and check your schema design to make sure it supports even distribution of reads and writes across each table.
Cloud Bigtable reports the following metrics for disk usage:
|Storage utilization (bytes)||
The amount of data stored in the cluster.
This value affects your costs. Also, as described below, you might need to add nodes to each cluster as the amount of data increases.
|Storage utilization (% max)||
The percentage of the cluster's storage capacity that is being used. The capacity is based on the number of nodes in your cluster.
In general, do not use more than 70% of the hard limit on total storage, so you have room to add more data. If you do not plan to add significant amounts of data to your instance, you can use up to 100% of the hard limit.
If you are using more than the recommended percentage of the storage limit, add nodes to the cluster. You can also delete existing data, but deleted data takes up more space, not less, until a compaction occurs.
For details about how this value is calculated, see Storage utilization per node.
The percentage your cluster is using of the maximum possible bandwidth for HDD reads and writes. Available only for HDD clusters.
If this value is frequently at 100%, you might experience increased latency. Add nodes to the cluster to reduce the disk load percentage.
Nodes for replicated clusters
In an instance that uses replication, make sure each cluster has enough nodes to support your use case:
If you use replication to provide high availability, or if you use multi-cluster routing in any of your app profiles, each cluster should have the same number of nodes. Also, as shown above under CPU usage, the recommended CPU utilization is reduced by half.
This configuration helps ensure that if an automatic failover is necessary, the healthy cluster has enough capacity to handle all of your traffic.
If all of your app profiles use single-cluster routing, each cluster can have a different number of nodes. Resize each cluster as needed based on the cluster's workload.
Because Cloud Bigtable stores a separate copy of your data with each cluster, each cluster must always have enough nodes to support your disk usage and to replicate writes between clusters.
You can still fail over manually from one cluster to another if necessary. However, if one cluster has many more nodes than another, and you need to fail over to the cluster with fewer nodes, you might need to add nodes first. There is no guarantee that additional nodes will be available when you need to fail over—the only way to reserve nodes in advance is to add them to your cluster.