Running JanusGraph with Cloud Bigtable

This tutorial shows you how to run JanusGraph on Google Cloud (GCP). JanusGraph is a graph database that supports working with large amounts of data. Graph databases help you to discover insights by modeling your data entities and the relationships between them. In graph terminology, entities are known as nodes or vertices and relationships are known as edges. Both vertices and edges can have associated properties.

Example of a property graph

Figure 1. Example of a property graph

Graph databases help you model a variety of domains and activities:

  • Social networks
  • Fraud analysis
  • Physical networks

When creating graph databases, you sometimes create millions or even billions of vertices and edges. When you use JanusGraph with Bigtable as the underlying storage layer, you can both execute fast queries and scale your storage layer independently for the size and throughput that you need. Use this tutorial to deploy a scalable JanusGraph infrastructure with Bigtable, which you can then use to traverse the relationships that exist in any graph database.

JanusGraph deployment with Bigtable on GKE

Figure 2. JanusGraph deployment with Bigtable on GKE



This tutorial uses the following billable components of Google Cloud:

  • Compute Engine, which is used by GKE
  • Bigtable

To generate a cost estimate based on your projected usage, use the pricing calculator. New Google Cloud users might be eligible for a free trial.

Before you begin

  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  2. In the Google Cloud Console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  3. Make sure that billing is enabled for your Cloud project. Learn how to confirm that billing is enabled for your project.

  4. Enable the Bigtable, Bigtable Admin, Compute Engine, and GKE APIs.

    Enable the APIs

When you finish this tutorial, you can avoid continued billing by deleting the resources that you created. See Cleaning up for more detail.

Preparing your environment

In this tutorial, you use Cloud Shell to enter commands. Cloud Shell gives you access to the command line in the Cloud Console and includes Cloud SDK and other tools that you need to develop in GCP. Cloud Shell appears as a window at the bottom of the Cloud Console. It can take several minutes to initialize, but the window appears immediately.

  1. Activate Cloud Shell.

    ACTIVATE Cloud Shell

  2. In Cloud Shell, set the default Compute Engine zone to the zone where you are going to create your Bigtable cluster and GKE cluster. This tutorial uses us-central1-f.

    gcloud config set compute/zone us-central1-f
  3. Create a GKE cluster to deploy JanusGraph:

    gcloud container clusters create janusgraph-tutorial \
        --cluster-version=1.15 \
        --machine-type n1-standard-4 \
        --scopes "",""
  4. Install Helm in your Cloud Shell environment:

    curl -fsSL -o
    chmod 700

Creating a Bigtable instance

For the JanusGraph storage backend, this tutorial uses Bigtable, which can scale rapidly to meet your needs. This tutorial uses a single-node development cluster, which is both economical and sufficient for the tutorial. You can start your projects with a development cluster and then move to a larger production cluster when you are ready to work with production data. The Bigtable documentation includes detailed discussion about performance and scaling to help you pick a cluster size for your own work.

Create your Bigtable instance by following these steps.

  1. In the Cloud Console, go to the Create Instance page:


  2. In the Instance name box, enter a name for your instance. You can use janusgraph or another lowercase name of your choosing. The page automatically sets Instance ID. Click Continue.

  3. For Select your storage type, select HDD. Click Continue.

  4. Cluster ID and Nodes is automatically set. For Region, select us-central1. For Zone, select us-central1-f or the zone where you created your GKE cluster earlier.

  5. Click Create to create the instance.

Make note of the instance ID, because you will use it in an upcoming step.

Configuring Helm

You use Helm to deploy applications to your Kubernetes cluster. After creating your cluster, initialize a Helm chart repository.

  1. Paste the following commands into Cloud Shell:

    helm repo add stable

Using Helm to install JanusGraph and Elasticsearch

In addition to using Bigtable as its storage backend, JanusGraph will use Elasticsearch as the indexing backend.

In this section, you use a Helm chart to deploy JanusGraph and Elasticsearch to your Kubernetes cluster. When you install the JanusGraph chart, Elasticsearch is included as a dependency, which simplifies the process.

  1. In Cloud Shell, set an environment variable to hold the value of the Bigtable instance ID that you noted earlier. Replace [YOUR_INSTANCE_ID] with the instance ID you specified earlier.


    For example, if you used the default suggestion of janusgraph for your Bigtable instance ID, you would run:

    export INSTANCE_ID=janusgraph
  2. Create a values.yaml file, which supplies Helm with the specific configuration to use when installing JanusGraph:

    cat > values.yaml << EOF
    replicaCount: 3
    service: type: LoadBalancer serviceAnnotations: "Internal"
    elasticsearch: deploy: true
    properties: storage.backend: hbase null $INSTANCE_ID $GOOGLE_CLOUD_PROJECT storage.hbase.ext.hbase.client.connection.impl: elasticsearch null cache.db-cache: true cache.db-cache-clean-wait: 20 cache.db-cache-time: 180000 cache.db-cache-size: 0.5
    persistence: enabled: false EOF
  3. Deploy the JanusGraph Helm chart by using the values.yaml file that you created:

    helm upgrade --install --wait --timeout 600s janusgraph stable/janusgraph -f values.yaml

    The install waits until all of the resources are ready before it completes. This process might take several minutes.

Verifying your JanusGraph deployment

When the helm install command finishes, it outputs a NOTES section that describes a getting started experience. From Cloud Shell, follow the steps that the NOTES section outlines to test if your JanusGraph environment is working.

  1. Set an environment variable with the name of a Kubernetes pod that is running JanusGraph:

    export POD_NAME=$(kubectl get pods --namespace default -l "app=janusgraph,release=janusgraph" -o jsonpath="{.items[0]}")
  2. Connect to the pod and run the Gremlin shell:

    kubectl exec -it $POD_NAME -- /janusgraph-0.2.0-hadoop2/bin/
  3. In the Gremlin console, connect to the Apache TinkerPop server:

    :remote connect tinkerpop.server conf/remote.yaml session
    :remote console

    The output looks similar to the following:

    gremlin> :remote connect tinkerpop.server conf/remote.yaml session
    ==>Configured localhost/[b08972f2-a2aa-4312-8018-bcd11bc9812c]
    gremlin> :remote console
    ==>All scripts will now be sent to Gremlin Server - [localhost/]-[b08972f2-a2aa-4312-8018-bcd11bc9812c] - type ':remote console' to return to local mode
  4. Run the following Gremlin commands to create two vertices and an edge:

    v1 = graph.addVertex(label, 'hello')
    v2 = graph.addVertex(label, 'world')
    v1.addEdge('followedBy', v2)

    The output looks similar to the following:

    gremlin> v1 = graph.addVertex(label, 'hello')
    gremlin>  v2 = graph.addVertex(label, 'world')
    gremlin>  v1.addEdge('followedBy', v2)
  5. Issue a Gremlin query to see what the label is for the vertex that follows an edge out from the vertex with the label hello:

    g.V().has(label, 'hello').out('followedBy').label()

    The query syntax is explained in the next section. For now, you see the word "world" as the output from the query:

    gremlin> g.V().has(label, 'hello').out('followedBy').label()

Loading and querying a sample dataset

Now that you have deployed JanusGraph and can connect to it by using Gremlin, you can begin loading and querying your own data. To help demonstrate what that process looks like, load the sample dataset that comes bundled with JanusGraph: the Graph of the Gods, which depicts mythological deities and their location properties.

  1. While you are still in the Gremlin shell from the last section, enter the following command:


    When the command completes, it returns null:

    gremlin> GraphOfTheGodsFactory.load(graph)
  2. With the sample graph loaded, you can issue graph traversal queries. For example, to find all brothers of Jupiter, enter the following query:

    g.V().has('name', 'jupiter').out('brother').values('name')

    You can break down this query by looking at the steps that it traverses:

    Traversal step Explanation
    g.V() Start with the collection of vertices.
    has('name', 'jupiter') Find one that has the property name with the value of jupiter.
    out('brother') From there, follow any edges that are labeled brother.
    values('name') For the vertices where those edges lead, get the name property.

    Here's the output of the query:

    gremlin> g.V().has('name', 'jupiter').out('brother').values('name')

To get more familiar with the traversal queries that are possible on this Graph of the Gods dataset, try out other sample queries from the JanusGraph docs.

Clean up

To avoid incurring charges to your Google Cloud account for the resources used in this tutorial, either delete the project that contains the resources, or keep the project and delete the individual resources.

  1. In the Cloud Console, go to the Manage resources page.

    Go to Manage resources

  2. In the project list, select the project that you want to delete, and then click Delete.
  3. In the dialog, type the project ID, and then click Shut down to delete the project.

What's next