Running JanusGraph with Cloud Bigtable

This tutorial shows you how to run JanusGraph on Google Cloud (GCP). JanusGraph is a graph database that supports working with large amounts of data. Graph databases help you to discover insights by modeling your data entities and the relationships between them. In graph terminology, entities are known as nodes or vertices and relationships are known as edges. Both vertices and edges can have associated properties.

Example of a property graph

Figure 1. Example of a property graph

Graph databases help you model a variety of domains and activities:

  • Social networks
  • Fraud analysis
  • Physical networks

When creating graph databases, you sometimes create millions or even billions of vertices and edges. When you use JanusGraph with Cloud Bigtable as the underlying storage layer, you can both execute fast queries and scale your storage layer independently for the size and throughput that you need. Use this tutorial to deploy a scalable JanusGraph infrastructure with Cloud Bigtable, which you can then use to traverse the relationships that exist in any graph database.

JanusGraph deployment with Cloud Bigtable on GKE

Figure 2. JanusGraph deployment with Cloud Bigtable on GKE



This tutorial uses the following billable components of Google Cloud:

  • Compute Engine, which is used by GKE
  • Cloud Bigtable

To generate a cost estimate based on your projected usage, use the pricing calculator. New Google Cloud users might be eligible for a free trial.

Before you begin

  1. Sign in to your Google Account.

    If you don't already have one, sign up for a new account.

  2. In the Cloud Console, on the project selector page, select or create a Cloud project.

    Go to the project selector page

  3. Make sure that billing is enabled for your Google Cloud project. Learn how to confirm billing is enabled for your project.

  4. Enable the Cloud Bigtable, Cloud Bigtable Admin, Compute Engine, and GKE APIs.

    Enable the APIs

When you finish this tutorial, you can avoid continued billing by deleting the resources that you created. See Cleaning up for more detail.

Preparing your environment

In this tutorial, you use Cloud Shell to enter commands. Cloud Shell gives you access to the command line in the Cloud Console and includes Cloud SDK and other tools that you need to develop in GCP. Cloud Shell appears as a window at the bottom of the Cloud Console. It can take several minutes to initialize, but the window appears immediately.

  1. Activate Cloud Shell.

    ACTIVATE Cloud Shell

  2. In Cloud Shell, set the default Compute Engine zone to the zone where you are going to create your Cloud Bigtable cluster and GKE cluster. This tutorial uses us-central1-f.

    gcloud config set compute/zone us-central1-f
  3. Create a GKE cluster to deploy JanusGraph:

    gcloud container clusters create janusgraph-tutorial \
        --cluster-version=1.15 \
        --machine-type n1-standard-4 \
        --scopes "",""
  4. Install Helm in your Cloud Shell environment:

    curl -fsSL -o
    chmod 700

Creating a Cloud Bigtable instance

For the JanusGraph storage backend, this tutorial uses Cloud Bigtable, which can scale rapidly to meet your needs. This tutorial uses a single-node development cluster, which is both economical and sufficient for the tutorial. You can start your projects with a development cluster and then move to a larger production cluster when you are ready to work with production data. The Cloud Bigtable documentation includes detailed discussion about performance and scaling to help you pick a cluster size for your own work.

Create your Cloud Bigtable instance by following these steps.

  1. In the Cloud Console, go to the Create Instance page:


  2. In the Instance name box, enter a name for your instance. You can use janusgraph or another lowercase name of your choosing. The page automatically sets Instance ID. Click Continue.

  3. For Select your storage type, select HDD. Click Continue.

  4. Cluster ID and Nodes is automatically set. For Region, select us-central1. For Zone, select us-central1-f or the zone where you created your GKE cluster earlier.

  5. Click Create to create the instance.

Make note of the instance ID, because you will use it in an upcoming step.

Configuring Helm

You use Helm to deploy applications to your Kubernetes cluster. After creating your cluster, initialize a Helm chart repository.

  1. Paste the following commands into Cloud Shell:

    helm repo add stable

Using Helm to install JanusGraph and Elasticsearch

In addition to using Cloud Bigtable as its storage backend, JanusGraph will use Elasticsearch as the indexing backend.

In this section, you use a Helm chart to deploy JanusGraph and Elasticsearch to your Kubernetes cluster. When you install the JanusGraph chart, Elasticsearch is included as a dependency, which simplifies the process.

  1. In Cloud Shell, set an environment variable to hold the value of the Cloud Bigtable instance ID that you noted earlier. Replace [YOUR_INSTANCE_ID] with the instance ID you specified earlier.


    For example, if you used the default suggestion of janusgraph for your Cloud Bigtable instance ID, you would run:

    export INSTANCE_ID=janusgraph
  2. Create a values.yaml file, which supplies Helm with the specific configuration to use when installing JanusGraph:

    cat > values.yaml << EOF
    replicaCount: 3
    service: type: LoadBalancer serviceAnnotations: "Internal"
    elasticsearch: deploy: true
    properties: storage.backend: hbase null $INSTANCE_ID $GOOGLE_CLOUD_PROJECT storage.hbase.ext.hbase.client.connection.impl: elasticsearch null cache.db-cache: true cache.db-cache-clean-wait: 20 cache.db-cache-time: 180000 cache.db-cache-size: 0.5
    persistence: enabled: false EOF
  3. Deploy the JanusGraph Helm chart by using the values.yaml file that you created:

    helm upgrade --install --wait --timeout 600s janusgraph stable/janusgraph -f values.yaml

    The install waits until all of the resources are ready before it completes. This process might take several minutes.

Verifying your JanusGraph deployment

When the helm install command finishes, it outputs a NOTES section that describes a getting started experience. From Cloud Shell, follow the steps that the NOTES section outlines to test if your JanusGraph environment is working.

  1. Set an environment variable with the name of a Kubernetes pod that is running JanusGraph:

    export POD_NAME=$(kubectl get pods --namespace default -l "app=janusgraph,release=janusgraph" -o jsonpath="{.items[0]}")
  2. Connect to the pod and run the Gremlin shell:

    kubectl exec -it $POD_NAME -- /janusgraph-0.2.0-hadoop2/bin/
  3. In the Gremlin console, connect to the Apache TinkerPop server:

    :remote connect tinkerpop.server conf/remote.yaml session
    :remote console

    The output looks similar to the following:

    gremlin> :remote connect tinkerpop.server conf/remote.yaml session
    ==>Configured localhost/[b08972f2-a2aa-4312-8018-bcd11bc9812c]
    gremlin> :remote console
    ==>All scripts will now be sent to Gremlin Server - [localhost/]-[b08972f2-a2aa-4312-8018-bcd11bc9812c] - type ':remote console' to return to local mode
  4. Run the following Gremlin commands to create two vertices and an edge:

    v1 = graph.addVertex(label, 'hello')
    v2 = graph.addVertex(label, 'world')
    v1.addEdge('followedBy', v2)

    The output looks similar to the following:

    gremlin> v1 = graph.addVertex(label, 'hello')
    gremlin>  v2 = graph.addVertex(label, 'world')
    gremlin>  v1.addEdge('followedBy', v2)
  5. Issue a Gremlin query to see what the label is for the vertex that follows an edge out from the vertex with the label hello:

    g.V().has(label, 'hello').out('followedBy').label()

    The query syntax is explained in the next section. For now, you see the word "world" as the output from the query:

    gremlin> g.V().has(label, 'hello').out('followedBy').label()

Loading and querying a sample dataset

Now that you have deployed JanusGraph and can connect to it by using Gremlin, you can begin loading and querying your own data. To help demonstrate what that process looks like, load the sample dataset that comes bundled with JanusGraph: the Graph of the Gods, which depicts mythological deities and their location properties.

  1. While you are still in the Gremlin shell from the last section, enter the following command:


    When the command completes, it returns null:

    gremlin> GraphOfTheGodsFactory.load(graph)
  2. With the sample graph loaded, you can issue graph traversal queries. For example, to find all brothers of Jupiter, enter the following query:

    g.V().has('name', 'jupiter').out('brother').values('name')

    You can break down this query by looking at the steps that it traverses:

    Traversal step Explanation
    g.V() Start with the collection of vertices.
    has('name', 'jupiter') Find one that has the property name with the value of jupiter.
    out('brother') From there, follow any edges that are labeled brother.
    values('name') For the vertices where those edges lead, get the name property.

    Here's the output of the query:

    gremlin> g.V().has('name', 'jupiter').out('brother').values('name')

To get more familiar with the traversal queries that are possible on this Graph of the Gods dataset, try out other sample queries from the JanusGraph docs.

Cleaning up

To avoid incurring charges to your Google Cloud Platform account for the resources used in this tutorial:

  1. In the Cloud Console, go to the Manage resources page.

    Go to the Manage resources page

  2. In the project list, select the project that you want to delete and then click Delete .
  3. In the dialog, type the project ID and then click Shut down to delete the project.

What's next