Using OpenTSDB to Monitor Time-Series Data on Cloud Platform

This tutorial describes how to collect, record, and monitor time-series data on Google Cloud Platform (GCP) using OpenTSDB running on Google Kubernetes Engine and Google Cloud Bigtable.

Time-series data is a highly valuable asset that you can use for several applications, including trending, monitoring, and machine learning. You can generate time-series data from server infrastructure, application code, and other sources. OpenTSDB can collect and retain large amounts of time-series data with a high degree of granularity.

This tutorial details how to create a scalable data collection layer using Kubernetes Engine and work with the collected data using Cloud Bigtable. The following diagram illustrates the high-level architecture of the solution:

High-level architecture diagram of this tutorial's solution for using TSDB on
GCP.

Objectives

  • Create a new Cloud Bigtable instance.
  • Create a new Kubernetes Engine cluster.
  • Deploy OpenTSDB to your Kubernetes Engine cluster.
  • Send time-series metrics to OpenTSDB.
  • Visualize metrics using OpenTSDB and Grafana.

Costs

This tutorial uses billable components of Cloud Platform, including:

  • Compute Engine
  • Kubernetes Engine
  • Cloud Bigtable
  • Google Cloud Storage

Use the Pricing Calculator to generate a cost estimate based on your projected usage.

New Cloud Platform users might be eligible for a free trial.

Before you begin

  1. Sign in to your Google Account.

    If you don't already have one, sign up for a new account.

  2. Select or create a GCP project.

    Go to the Manage resources page

  3. Make sure that billing is enabled for your project.

    Learn how to enable billing

  4. Enable the Cloud Bigtable, Cloud Bigtable Admin, Compute Engine, and Kubernetes Engine APIs.

    Enable the APIs

Make note of the Project ID for use in a later step.

Preparing your environment

You will use Google Cloud Shell to enter commands in this tutorial. Cloud Shell gives you access to the command line in Google Cloud Platform Console, and includes Google Cloud SDK and other tools you need for Cloud Platform development. Cloud Shell appears as a window at the bottom of Google Cloud Platform Console. It can take several minutes to initialize, but the window appears immediately.

  1. Activate Cloud Shell.

    ACTIVATE CLOUD SHELL

  2. Set the default Compute Engine zone to the zone where you are going to create your Cloud Bigtable cluster, for example us-central1-f.

    gcloud config set compute/zone us-central1-f
    
  3. Clone the git repository containing the sample code.

    git clone https://github.com/GoogleCloudPlatform/opentsdb-bigtable.git
    
  4. Navigate to the sample code directory.

    cd opentsdb-bigtable
    

Creating a Cloud Bigtable instance

This tutorial uses Google Cloud Bigtable to store the time-series data that you collect. You must create a Cloud Bigtable instance to do that work.

Cloud Bigtable is a key/wide-column store that works especially well for time-series data, as is explained in Cloud Bigtable Schema Design for Time Series Data. Cloud Bigtable supports the HBase API, which makes it easy for you to use software designed to work with Apache HBase, such as OpenTSDB. You can learn about the HBase schema used by OpenTSDB in the OpenTSDB documentation.

A key component of OpenTSDB is the AsyncHBase client, which enables it to bulk-write to HBase in a fully asynchronous, non-blocking, thread-safe manner. When you use OpenTSDB with Cloud Bigtable, AsyncHBase is implemented as the AsyncBigtable client.

The ability to easily scale to meet your needs is a key feature of Cloud Bigtable. This tutorial uses a single-node development cluster, because it is sufficient for the task and is economical. You should start your projects in a development cluster, moving to a larger production cluster when you are ready to work with production data. The Cloud Bigtable documentation includes detailed discussion about performance and scaling to help you pick a cluster size for your own work.

Follow these steps to create your Cloud Bigtable instance:

  1. Go to the Create Instance page in the GCP Console.

    GO TO THE CREATE INSTANCE PAGE

  2. Enter a name for your instance in the Instance name box. You can use OpenTSDB instance or another name of your choosing. The page automatically sets Instance ID and Cluster ID after you enter your instance name.

  3. Set Instance type to Development.

  4. In Zone, select us-central1-f or the zone from which you are going to run OpenTSDB.

  5. Click Create to create the instance.

Make note of the values of Instance ID and Zone. You will use them in a later step.

Creating a Kubernetes Engine cluster

Kubernetes Engine provides a managed Kubernetes environment. After you create a Kubernetes Engine cluster, you can deploy Kubernetes pods to it. This tutorial uses Kubernetes Engine and Kubernetes pods to run OpenTSDB.

OpenTSDB separates its storage from its application layer, which enables it to be deployed across multiple instances simultaneously. By running in parallel, it can handle a large amount of time-series data. Packaging OpenTSDB into a Docker container enables easy deployment at scale using Kubernetes Engine.

Create a Kubernetes cluster by running the following command. This operation can take a few minutes to complete:

gcloud container clusters create opentsdb-cluster --scopes \
"https://www.googleapis.com/auth/bigtable.admin",\
"https://www.googleapis.com/auth/bigtable.data"

Adding the two extra scopes to your Kubernetes cluster allows your OpenTSDB container to interact with Cloud Bigtable. You can pull images from Google Container Registry without adding a scope for Cloud Storage, because the cluster can read from Cloud Storage by default. You might need additional scopes in other deployments.

The rest of this tutorial uses a prebuilt container, gcr.io/cloud-solutions-images/opentsdb-bigtable:v1 located in Container Registry. The Dockerfile and ENTRYPOINT script used to build the container are located in the build folder of the tutorial repository.

Creating a ConfigMap with configuration details

Kubernetes provides a mechanism called the ConfigMap to separate configuration details from the container image to make applications more portable. The configuration for OpenTSDB is specified in opentsdb.conf. A ConfigMap containing opentsdb.conf is included with the sample code. You must edit it to reflect your instance details.

Create the ConfigMap

Edit the OpenTSDB configuration to use the project name, instance identifier, and zone that you used when creating your instance.

  1. Open the code editor built into Cloud Shell by clicking the pencil icon in the toolbar at the top of the Cloud Shell window.
  2. Select opentsdb-config.yaml under opentsdb/configmaps to open it in the editor.
  3. Replace the placeholder text with the project name, instance identifier, and zone you set earlier in the tutorial.
  4. From the Cloud Shell prompt, create a ConfigMap from the updated opentsdb-config.yaml:

    kubectl create -f configmaps/opentsdb-config.yaml
    

Creating OpenTSDB tables in Cloud Bigtable

Before you can read or write data using OpenTSDB, you need to create the necessary tables in Cloud Bigtable to store that data. Follow these steps to create a Kubernetes job that creates the tables.

  1. Launch the job:

    kubectl create -f jobs/opentsdb-init.yaml
    
  2. The job can take up to a minute or more to complete. Verify the job has completed successfully by periodically running this command:

    kubectl describe jobs
    

    The output should indicate that one job has succeeded under the heading, Pods Statuses.

  3. Get the table creation job logs by running the following commands:

    pods=$(kubectl get pods  --show-all --selector=job-name=opentsdb-init \
    --output=jsonpath={.items..metadata.name})
    
    kubectl logs $pods
    

When you get the logs, examine the bottom of the output, which should indicate each table that was created. This job runs several table creation commands, each taking the form of create 'TABLE_NAME'. Look for a line of the form 0 row(s) in 0.0000 seconds, where the actual command duration is listed instead of 0.0000.

Your output should include a section that looks something like the following:

create 'tsdb-uid',
  {NAME => 'id', COMPRESSION => 'NONE', BLOOMFILTER => 'ROW'},
  {NAME => 'name', COMPRESSION => 'NONE', BLOOMFILTER => 'ROW'}
0 row(s) in 1.3680 seconds

Hbase::Table - tsdb-uid

create 'tsdb',
  {NAME => 't', VERSIONS => 1, COMPRESSION => 'NONE', BLOOMFILTER => 'ROW'}
0 row(s) in 0.6570 seconds

Hbase::Table - tsdb

create 'tsdb-tree',
  {NAME => 't', VERSIONS => 1, COMPRESSION => 'NONE', BLOOMFILTER => 'ROW'}
0 row(s) in 0.2670 seconds

Hbase::Table - tsdb-tree

create 'tsdb-meta',
  {NAME => 'name', COMPRESSION => 'NONE', BLOOMFILTER => 'ROW'}
0 row(s) in 0.5850 seconds

Hbase::Table - tsdb-meta

You only need to run this job once. It returns an error message if the tables already exist. You can continue the tutorial using existing tables, if present.

Data Model

The tables you just created will store data points from OpenTSDB. In a later step, you will write time-series data into these tables. Time-series data points are organized and stored as follows:

Field Required Description Example
metric Required Item that is being measured - the default key sys.cpu.user
timestamp Required Epoch time of the measurement 1497561091
value Required Measurement value 89.3
tags At least one tag is required Qualifies the measurement for querying purposes

hostname=www

cpu=0

env=prod

Deploying OpenTSDB

The rest of this tutorial provides instructions for making the sample scenario work. The following diagram shows the architecture you will use:

Diagram of the architecture used in this tutorial to write, read, and
visualize time-series data.

This tutorial uses two Kubernetes deployments. One deployment sends metrics into OpenTSDB and the other reads from it. Using two deployments prevents long-running reads and writes from blocking each other. The pods in each deployment use the same container. OpenTSDB provides a daemon called tsd that runs in each container.

A single tsd process can handle a high throughput of events every second. To distribute load, each deployment in this tutorial creates 3 replicas of the read and write pods.

Create a deployment for writing metrics

The configuration information for the writer deployment is in opentsdb-write.yaml in the deployments folder of the tutorial repository. Use the following command to create it:

kubectl create -f deployments/opentsdb-write.yaml

Create a deployment for reading metrics

The configuration information for the reader deployment is in opentsdb-read.yaml in the deployments folder of the tutorial repository. Use the following command to create it:

kubectl create -f deployments/opentsdb-read.yaml

In a production deployment, you can increase the number of tsd pods running manually or by using autoscaling in Kubernetes. Similarly, you can increase the number of instances in your Kubernetes Engine cluster manually or by using Cluster Autoscaler.

Creating OpenTSDB services

In order to provide consistent network connectivity to the deployments, you will create two Kubernetes services. One service writes metrics into OpenTSDB and the other reads.

Create the service for writing metrics

The configuration information for the metrics writing service is contained in opentsdb-write.yaml in the services folder of the tutorial repository. Use the following command to create the service:

kubectl create -f services/opentsdb-write.yaml

This service is created inside your Kubernetes cluster and is accessible to other services running in your cluster. In the next section of this tutorial you write metrics to this service.

Create the service for reading metrics

The configuration information for the metrics reading service is contained in opentsdb-write.yaml in the services folder of the tutorial repository. Use the following command to create the service:

kubectl create -f services/opentsdb-read.yaml

Writing time-series data to OpenTSDB

There are several mechanisms to write data into OpenTSDB. After you define service endpoints, you can direct processes to begin writing data to them. This tutorial uses Heapster to demonstrate writing data. Your Heapster deployment collects data about Kubernetes and publishes metrics from the Kubernetes Engine cluster on which you are running OpenTSDB.

Use the following command to deploy Heapster to your cluster:

kubectl create -f deployments/heapster.yaml

Examining time-series data with OpenTSDB

You can query time-series metrics by using the opentsdb-read service endpoint that you deployed earlier in the tutorial. You can use the data in a variety of ways. One common option is to visualize it. OpenTSDB includes a basic interface to visualize metrics that it collects. This tutorial uses Grafana, a popular alternative for visualizing metrics that provides additional functionality.

Set up Grafana

Running Grafana in your cluster requires a similar process to that you used to set up OpenTSDB. In addition to creating a ConfigMap and a deployment, you need to configure port forwarding so that you can access Grafana while it is running in your Kubernetes cluster.

Use the following steps to set up Grafana

  1. Create the Grafana ConfigMap using the configuration information in grafana.yaml in the configmaps folder of the tutorial repository.

    kubectl create -f configmaps/grafana.yaml
    
  2. Create the Grafana deployment using the configuration information in grafana.yaml in the deployments folder of the tutorial repository.

    kubectl create -f deployments/grafana.yaml
    
  3. Get the name of the Grafana pod in the cluster and use it to set up port forwarding.

    grafana=$(kubectl get pods --show-all --selector=app=grafana \
      --output=jsonpath={.items..metadata.name})
    
    kubectl port-forward $grafana 8080:3000
    
  4. Verify that forwarding was successful. The output should match the following:

    Forwarding from 127.0.0.1:8080 -> 3000
    Forwarding from [::1]:8080 -> 3000
    

Connect to the Grafana web interface

In Cloud Shell, click Web Preview and then select Preview on port 8080.

A new browser tab opens and connects to the Grafana web interface. After a few moments, the browser displays graphs like the following:

Example Grafana visualization

This deployment of Grafana has been customized for this tutorial. The files configmaps/grafana.yaml and deployments/grafana.yaml configure Grafana to connect to the opentsdb-read service, allow anonymous authentication, and display some basic cluster metrics. A deployment of Grafana in production would implement the proper authentication mechanisms and use richer time-series graphs.

Cleaning up

To avoid incurring charges to your Google Cloud Platform account for the resources used in this tutorial:

  1. Delete the Kubernetes cluster to terminate all the artifacts previously created with the kubectl create command:

    gcloud container clusters delete opentsdb-cluster

    To delete the Kubernetes cluster, confirm by typing 'Y' or pressing Enter.

  2. To delete the Cloud Bigtable cluster, click Products & services in Cloud Platform Console. Click Bigtable, select the cluster that you created earlier, and click Delete.

What's next

  • To learn how to improve the performance of your uses of OpenTSDB, consult Cloud Bigtable Schema Design for Time Series Data.

  • The video Bigtable in Action, in Google Cloud Next 17, describes field promotion and other performance considerations.

  • The documentation on cluster scopes for Kubernetes Engine Clusters describes default scopes, such as Cloud Storage, and scopes you can add for other Google services.

  • Try out other Cloud Platform features for yourself. Have a look at our tutorials.

Was this page helpful? Let us know how we did:

Send feedback about...