Time-series data is a highly valuable asset that you can use for several applications, including trending, monitoring, and machine learning. You can generate time-series data from server infrastructure, application code, and other sources. OpenTSDB can collect and retain large amounts of time-series data with a high degree of granularity.
This tutorial shows software engineers and architects how to create a scalable collection layer for time-series data by using GKE. It also shows how to work with the collected data by using Bigtable. This tutorial assumes that you are familiar with Kubernetes and Bigtable.
The following diagram shows the high-level architecture of this tutorial:
The preceding diagram shows multiple sources of time-series data, such as IoT events and system metrics, that are stored in Bigtable by using OpenTSDB deployed on GKE.
Objectives
- Create a Bigtable instance.
- Create a GKE cluster.
- Deploy OpenTSDB to your GKE cluster.
- Send time-series metrics to OpenTSDB.
- Visualize metrics using OpenTSDB and Grafana.
Costs
This tutorial uses the following billable components of Google Cloud:
To generate a cost estimate based on your projected usage, use the pricing calculator. New Google Cloud users might be eligible for a free trial.
When you finish this tutorial, you can avoid continued billing by deleting the resources you created. For more information, see Cleaning up.
Before you begin
-
In the Google Cloud Console, on the project selector page, select or create a Google Cloud project.
-
Make sure that billing is enabled for your Cloud project. Learn how to confirm that billing is enabled for your project.
- Enable the Bigtable, Bigtable Admin, GKE, Compute Engine APIs.
In the Cloud Console, go to the Dashboard page.
Make a note of the project ID because it's used in a later step.
-
In the Cloud Console, activate Cloud Shell.
Preparing your environment
In Cloud Shell, set the default Compute Engine zone:
gcloud config set compute/zone COMPUTE_ZONE
Replace
COMPUTE_ZONE
with the zone where you are creating your Bigtable cluster—for example,us-central1-b.
Clone the git repository that contains the sample code:
git clone https://github.com/GoogleCloudPlatform/opentsdb-bigtable.git
Go to the sample code directory:
cd opentsdb-bigtable
Creating a Bigtable instance
This tutorial uses Bigtable to store the time-series data that you collect, so you must create a Bigtable instance.
Bigtable is a key/wide-column store that works well for time-series data. Bigtable supports the HBase API, so you can use software designed to work with Apache HBase, such as OpenTSDB. For more information about the HBase schema used by OpenTSDB, see HBase Schema.
A key component of OpenTSDB is the AsyncHBase client, which enables you to bulk-write to HBase in a fully asynchronous, non-blocking, thread-safe manner. When you use OpenTSDB with Bigtable, AsyncHBase is implemented as the AsyncBigtable client.
This tutorial uses a Bigtable instance with a single-node cluster. When moving to a production environment, consider using Bigtable instances with larger clusters. For more information about picking a cluster size, see Understanding Bigtable performance.
Create a Bigtable instance:
gcloud bigtable instances create INSTANCE_ID \ --cluster=CLUSTER_ID \ --cluster-zone=COMPUTE_ZONE \ --display-name=OpenTSDB \ --cluster-num-nodes=1
Replace the following:
INSTANCE_ID
: The permanent identifier for the instance.CLUSTER_ID
: The permanent identifier for the cluster.COMPUTE_ZONE
: The zone where the cluster runs. We recommend using the same default Compute Engine zone that you previously set.
Creating a GKE cluster
GKE provides a managed Kubernetes environment. After you create a GKE cluster, you can deploy Kubernetes Pods to it. This tutorial uses GKE and Kubernetes Pods to run OpenTSDB.
OpenTSDB separates its storage from its application layer, which enables it to be simultaneously deployed across multiple instances. By running in parallel, OpenTSDB can handle a large amount of time-series data.
In Cloud Shell, create a GKE cluster:
gcloud container clusters create CLUSTER_NAME --scopes \ "https://www.googleapis.com/auth/devstorage.read_only",\ "https://www.googleapis.com/auth/logging.write",\ "https://www.googleapis.com/auth/monitoring",\ "https://www.googleapis.com/auth/bigtable.data",\ "https://www.googleapis.com/auth/bigtable.admin",\ "https://www.googleapis.com/auth/servicecontrol",\ "https://www.googleapis.com/auth/service.management.readonly",\ "https://www.googleapis.com/auth/trace.append"
Replace the following:
CLUSTER_NAME
: the name of your new cluster
This operation can take a few minutes to complete. Adding the scopes to your GKE cluster allows your OpenTSDB container to interact with Bigtable and Container Registry.
The rest of this tutorial uses a prebuilt container,
gcr.io/cloud-solutions-images/opentsdb-bigtable:v2,
located in Container Registry. The Dockerfile andentrypoint
script used to build the container are located in thebuild
folder of the tutorial repository.Connect to your GKE cluster:
gcloud container clusters get-credentials CLUSTER_NAME
Creating a ConfigMap with configuration details
Kubernetes uses the
ConfigMap
to decouple configuration details from the container image in order to make
applications more portable. The configuration for OpenTSDB is specified in the
opentsdb.conf
file. A ConfigMap containing the opentsdb.conf
file is
included with the sample code.
Edit the OpenTSDB configuration to use the project name, instance identifier, and zone that you used when creating your instance.
- In Cloud Shell, click Open Editor.
- To open the ConfigMap file in the editor, go to
opentsdb-bigtable
>configmaps
>opentsdb-config.yaml
In the
opentsdb-config.yaml
file, replace the placeholder text for the following variables with the values that you previously set:- Replace
google.bigtable.project.id
with your project ID. - Replace
google.bigtable.zone.id
with the value that you used forCOMPUTE_ZONE
. - Replace
google.bigtable.instance.id
with the value that you used forINSTANCE_ID
.
- Replace
To change from the editor to the Cloud Shell terminal, click Open Terminal.
Create a ConfigMap from the updated
opentsdb-config.yaml
file:kubectl create -f configmaps/opentsdb-config.yaml
Creating OpenTSDB tables in Bigtable
Before you can read or write data using OpenTSDB, you need to create tables in Bigtable to store that data.
- To create the tables, create a Kubernetes job.
In Cloud Shell, launch the job:
kubectl create -f jobs/opentsdb-init.yaml
The job can take up to a minute or more to complete. Verify that the job has completed successfully:
kubectl describe jobs
The output shows that one job has succeeded when
Pods Statuses
showsSuccess
Examine the table creation job logs:
pods=$(kubectl get pods --selector=job-name=opentsdb-init \ --output=jsonpath={.items..metadata.name}) kubectl logs $pods
The output is similar to the following:
create 'tsdb-uid', {NAME => 'id', COMPRESSION => 'NONE', BLOOMFILTER => 'ROW', DATA_BLOCK_ENCODING => 'DIFF'}, {NAME => 'name', COMPRESSION => 'NONE', BLOOMFILTER => 'ROW', DATA_BLOCK_ENCODING => 'DIFF'} 0 row(s) in 3.2730 seconds create 'tsdb', {NAME => 't', VERSIONS => 1, COMPRESSION => 'NONE', BLOOMFILTER => 'ROW', DATA_BLOCK_ENCODING => 'DIFF'} 0 row(s) in 1.8440 seconds create 'tsdb-tree', {NAME => 't', VERSIONS => 1, COMPRESSION => 'NONE', BLOOMFILTER => 'ROW', DATA_BLOCK_ENCODING => 'DIFF'} 0 row(s) in 1.5420 seconds create 'tsdb-meta', {NAME => 'name', COMPRESSION => 'NONE', BLOOMFILTER => 'ROW', DATA_BLOCK_ENCODING => 'DIFF'} 0 row(s) in 1.9910 seconds
The output lists each table that was created. This job runs several table creation commands, each using the format of
create TABLE_NAME
. The tables are successfully created when you have output in the form of0 row(s) in TIME seconds
.TABLE_NAME
: the name of the table that the job createsTIME
: the amount of time it took to create the table
Data model
The tables that you created store data points from OpenTSDB. In a later step, you write time-series data into these tables. Time-series data points are organized and stored as follows:
Field | Required | Description | Example |
---|---|---|---|
metric
|
Required | Item that is being measured—the default key |
sys.cpu.user
|
timestamp
|
Required | Unix epoch time of the measurement | 1497561091 |
tags
|
At least one tag is required | Qualifies the measurement for querying purposes |
hostname=www
cpu=0
env=prod
|
value
|
Required | Measurement value | 89.3 |
The metric, timestamp, and tags (tag key and tag value) form the row key. The timestamp is normalized to one hour, to ensure that a row does not contain too many data points. For more information, see HBase Schema.
Deploying OpenTSDB
The following diagram shows the architecture to deploy OpenTSDB by using Bigtable:
This tutorial uses two OpenTSDB Kubernetes
deployments:
one deployment sends metrics to Bigtable and the other deployment
reads from it. Using two deployments prevents long-running reads and writes from
blocking each other. The Pods in each deployment use the same container.
OpenTSDB provides a daemon called
tsd
that runs in each container. A single tsd
process can handle a high throughput
of events per second. To distribute load, each deployment in this tutorial
creates three replicas of the read and write Pods.
In Cloud Shell, create a deployment for writing metrics:
kubectl create -f deployments/opentsdb-write.yaml
The configuration information for the write deployment is in the
opentsdb-write.yaml
file in thedeployments
folder of the tutorial repository.Create a deployment for reading metrics:
kubectl create -f deployments/opentsdb-read.yaml
The configuration information for the reader deployment is in the
opentsdb-read.yaml
file in thedeployments
folder of the tutorial repository.
In a production deployment, you can increase the number of tsd
Pods that are
running, either manually or by using
autoscaling
in Kubernetes. Similarly, you can increase the number of instances in your
GKE cluster manually or by using
cluster autoscaler.
Creating OpenTSDB services
In order to provide consistent network connectivity to the deployments, you create two Kubernetes services: one service writes metrics into OpenTSDB and the other reads.
In Cloud Shell, create the service for writing metrics:
kubectl create -f services/opentsdb-write.yaml
The configuration information for the metrics writing service is contained in the
opentsdb-write.yaml
file in theservices
folder of the tutorial repository. This service is created inside your Kubernetes cluster and is reachable by other services running in your cluster.Create the service for reading metrics:
kubectl create -f services/opentsdb-read.yaml
The configuration information for the metrics reading service is contained in the
opentsdb-read.yaml
file in theservices
folder of the tutorial repository.
Writing time-series data to OpenTSDB
There are several mechanisms to write data into OpenTSDB. After you define service endpoints, you can direct processes to begin writing data to them. This tutorial uses Heapster to demonstrate writing data. Your Heapster deployment collects data about Kubernetes and publishes metrics from the GKE cluster from which you are running OpenTSDB.
In Cloud Shell, deploy Heapster to your cluster:
kubectl create -f deployments/heapster.yaml
Examining time-series data with OpenTSDB
You can query time-series metrics by using the opentsdb-read
service endpoint
that you deployed earlier in the tutorial. You can use the data in various ways.
One common option is to visualize it. OpenTSDB includes a basic interface to
visualize metrics that it collects. This tutorial uses
Grafana,
a popular alternative for visualizing metrics that provides additional
functionality.
Running Grafana in your cluster requires a similar process that you used to set up OpenTSDB. In addition to creating a ConfigMap and a deployment, you need to configure port forwarding so that you can access Grafana while it is running in your Kubernetes cluster.
In Cloud Shell, create the Grafana ConfigMap using the configuration information in the
grafana.yaml
file in theconfigmaps
folder of the tutorial repository:kubectl create -f configmaps/grafana.yaml
Create the Grafana deployment using the configuration information in the
grafana.yaml
file in thedeployments
folder of the tutorial repository:kubectl create -f deployments/grafana.yaml
Get the name of the Grafana Pod in the cluster and use it to set up port forwarding:
grafana=$(kubectl get pods --selector=app=grafana \ --output=jsonpath={.items..metadata.name}) Note:Note: kubectl port-forward $grafana 8080:3000
Verify that forwarding was successful. The output is similar to the following:
Forwarding from 127.0.0.1:8080 -> 3000
To connect to the Grafana web interface, in Cloud Shell, click Web Preview and then select Preview on port 8080.
For more information, see Using web preview.
A new browser tab opens and connects to the Grafana web interface. After a few moments, the browser displays graphs like the following:
This deployment of Grafana has been customized for this tutorial. The files
configmaps/grafana.yaml
anddeployments/grafana.yaml
configure Grafana to connect to theopentsdb-read
service, allow anonymous authentication, and display some basic cluster metrics. For a deployment of Grafana in a production environment, we recommend that you implement the proper authentication mechanisms and use richer time-series graphs.
Cleaning up
To avoid incurring charges to your Google Cloud account for the resources used in this tutorial, either delete the project that contains the resources, or keep the project and delete the individual resources.
Delete the individual resources
Delete the Kubernetes cluster to delete all the artifacts that you created:
gcloud container clusters delete CLUSTER_NAME
To delete the Kubernetes cluster, confirm by typing
Y
.To delete the Bigtable instance, do the following:
In the Cloud Console, go to Bigtable.
Select the instance that you previously created, and then click Delete instance.
Delete the project
- In the Cloud Console, go to the Manage resources page.
- In the project list, select the project that you want to delete, and then click Delete.
- In the dialog, type the project ID, and then click Shut down to delete the project.
What's next
- To learn how to improve the performance of your uses of OpenTSDB, see Bigtable Schema Design for Time Series Data.
- To learn how to migrate from HBase to Bigtable, see Migrating data from HBase to Bigtable.
- The video Bigtable in Action, from Google Cloud Next 17, describes field promotion—an important performance improvement.
- To learn more about default scopes for GKE clusters, see cluster scopes.
- Try out other Google Cloud features for yourself. Have a look at our tutorials.