Running JanusGraph with Bigtable

This tutorial shows you how to run JanusGraph on Google Cloud (GCP). JanusGraph is a graph database that supports working with large amounts of data. Graph databases help you to discover insights by modeling your data entities and the relationships between them. In graph terminology, entities are known as nodes or vertices and relationships are known as edges. Both vertices and edges can have associated properties.

Example of a property graph

Figure 1. Example of a property graph

Graph databases help you model a variety of domains and activities:

  • Social networks
  • Fraud analysis
  • Physical networks

When creating graph databases, you sometimes create millions or even billions of vertices and edges. When you use JanusGraph with Bigtable as the underlying storage layer, you can both execute fast queries and scale your storage layer independently for the size and throughput that you need. Use this tutorial to deploy a scalable JanusGraph infrastructure with Bigtable, which you can then use to traverse the relationships that exist in any graph database.

JanusGraph deployment with Bigtable on GKE

Figure 2. JanusGraph deployment with Bigtable on GKE

Objectives

Costs

This tutorial uses the following billable components of Google Cloud Platform:

  • Compute Engine, which is used by GKE
  • Bigtable

To generate a cost estimate based on your projected usage, use the pricing calculator. New GCP users might be eligible for a free trial.

Before you begin

  1. Sign in to your Google Account.

    If you don't already have one, sign up for a new account.

  2. Select or create a GCP project.

    Go to the project selector page

  3. Make sure that billing is enabled for your Google Cloud Platform project. Learn how to confirm billing is enabled for your project.

  4. Enable the Bigtable, Bigtable Admin, Compute Engine, and GKE APIs.

    Enable the APIs

When you finish this tutorial, you can avoid continued billing by deleting the resources that you created. See Cleaning up for more detail.

Preparing your environment

In this tutorial, you use Cloud Shell to enter commands. Cloud Shell gives you access to the command line in the Cloud Console and includes Cloud SDK and other tools that you need to develop in GCP. Cloud Shell appears as a window at the bottom of the Cloud Console. It can take several minutes to initialize, but the window appears immediately.

  1. Activate Cloud Shell.

    ACTIVATE Cloud Shell

  2. In Cloud Shell, set the default Compute Engine zone to the zone where you are going to create your Bigtable cluster and GKE cluster. This tutorial uses us-central1-f.

    gcloud config set compute/zone us-central1-f
  3. Create a GKE cluster to deploy JanusGraph:

    gcloud container clusters create janusgraph-tutorial --machine-type n1-standard-4 \
        --scopes "https://www.googleapis.com/auth/bigtable.admin","https://www.googleapis.com/auth/bigtable.data"
    
  4. Install Helm in your Cloud Shell environment:

    curl https://raw.githubusercontent.com/kubernetes/helm/master/scripts/get | bash
    

Creating a Bigtable instance

For the JanusGraph storage backend, this tutorial uses Bigtable, which can scale rapidly to meet your needs. This tutorial uses a single-node development cluster, which is both economical and sufficient for the tutorial. You can start your projects with a development cluster and then move to a larger production cluster when you are ready to work with production data. The Bigtable documentation includes detailed discussion about performance and scaling to help you pick a cluster size for your own work.

Create your Bigtable instance by following these steps.

  1. In the Cloud Console, go to the Create Instance page:

    GO TO THE CREATE INSTANCE PAGE

  2. In the Instance name box, enter a name for your instance. You can use janusgraph or another lowercase name of your choosing. The page automatically sets Instance ID and Cluster ID after you enter your instance name.

  3. Set Instance type to Development.

  4. Under Zone, select us-central1-f or the zone where you created your GKE cluster earlier.

  5. Click Create to create the instance.

Make note of the instance ID, because you will use it in an upcoming step.

Configuring Helm

You use Helm to deploy applications to your Kubernetes cluster. After creating your cluster, configure Helm to work with the cluster.

  1. Paste the following commands into Cloud Shell:

    kubectl create serviceaccount tiller --namespace kube-system
    kubectl create clusterrolebinding tiller-cluster-rule --clusterrole=cluster-admin \
        --serviceaccount=kube-system:tiller
    helm init --service-account=tiller
    until (timeout 7 helm version > /dev/null 2>&1); do echo "Waiting for tiller install..."; done
    

    You might see the output Waiting for tiller install... a few times, but when it stops, you can use Helm with your Kubernetes cluster.

Using Helm to install JanusGraph and Elasticsearch

In addition to using Bigtable as its storage backend, JanusGraph will use Elasticsearch as the indexing backend.

In this section, you use a Helm chart to deploy JanusGraph and Elasticsearch to your Kubernetes cluster. When you install the JanusGraph chart, Elasticsearch is included as a dependency, which simplifies the process.

  1. In Cloud Shell, set an environment variable to hold the value of the Bigtable instance ID that you noted earlier. Replace [YOUR_INSTANCE_ID] with the instance ID you specified earlier.

    export INSTANCE_ID=[YOUR_INSTANCE_ID]

    For example, if you used the default suggestion of janusgraph for your Bigtable instance ID, you would run:

    export INSTANCE_ID=janusgraph
  2. Create a values.yaml file, which supplies Helm with the specific configuration to use when installing JanusGraph:

    cat > values.yaml << EOF
    replicaCount: 3
    service: type: LoadBalancer serviceAnnotations: cloud.google.com/load-balancer-type: "Internal"
    elasticsearch: deploy: true
    properties: storage.backend: hbase storage.directory: null storage.hbase.ext.google.bigtable.instance.id: $INSTANCE_ID storage.hbase.ext.google.bigtable.project.id: $GOOGLE_CLOUD_PROJECT storage.hbase.ext.hbase.client.connection.impl: com.google.cloud.bigtable.hbase1_x.BigtableConnection index.search.backend: elasticsearch index.search.directory: null cache.db-cache: true cache.db-cache-clean-wait: 20 cache.db-cache-time: 180000 cache.db-cache-size: 0.5
    persistence: enabled: false EOF
  3. Deploy the JanusGraph Helm chart by using the values.yaml file that you created:

    helm install --wait --timeout 600 --name janusgraph stable/janusgraph -f values.yaml
    

    The install waits until all of the resources are ready before it completes. This process might take several minutes.

Verifying your JanusGraph deployment

When the helm install command finishes, it outputs a NOTES section that describes a getting started experience. From Cloud Shell, follow the steps that the NOTES section outlines to test if your JanusGraph environment is working.

  1. Set an environment variable with the name of a Kubernetes pod that is running JanusGraph:

    export POD_NAME=$(kubectl get pods --namespace default -l "app=janusgraph,release=janusgraph" -o jsonpath="{.items[0].metadata.name}")
    
  2. Connect to the pod and run the Gremlin shell:

    kubectl exec -it $POD_NAME -- /janusgraph-0.2.0-hadoop2/bin/gremlin.sh
    
  3. In the Gremlin console, connect to the Apache TinkerPop server:

    :remote connect tinkerpop.server conf/remote.yaml session
    :remote console
    

    The output looks similar to the following:

    gremlin> :remote connect tinkerpop.server conf/remote.yaml session
    ==>Configured localhost/127.0.0.1:8182-[b08972f2-a2aa-4312-8018-bcd11bc9812c]
    gremlin> :remote console
    ==>All scripts will now be sent to Gremlin Server - [localhost/127.0.0.1:8182]-[b08972f2-a2aa-4312-8018-bcd11bc9812c] - type ':remote console' to return to local mode
    gremlin>
    
  4. Run the following Gremlin commands to create two vertices and an edge:

    v1 = graph.addVertex(label, 'hello')
    v2 = graph.addVertex(label, 'world')
    v1.addEdge('followedBy', v2)
    graph.tx().commit()
    

    The output looks similar to the following:

    gremlin> v1 = graph.addVertex(label, 'hello')
    ==>v[4344]
    gremlin>  v2 = graph.addVertex(label, 'world')
    ==>v[40964152]
    gremlin>  v1.addEdge('followedBy', v2)
    ==>e[17j-3co-4fmd-oe054][4344-followedBy->40964152]
    gremlin>
    
  5. Issue a Gremlin query to see what the label is for the vertex that follows an edge out from the vertex with the label hello:

    g.V().has(label, 'hello').out('followedBy').label()
    

    The query syntax is explained in the next section. For now, you see the word "world" as the output from the query:

    gremlin> g.V().has(label, 'hello').out('followedBy').label()
    ==>world
    

Loading and querying a sample dataset

Now that you have deployed JanusGraph and can connect to it by using Gremlin, you can begin loading and querying your own data. To help demonstrate what that process looks like, load the sample dataset that comes bundled with JanusGraph: the Graph of the Gods, which depicts mythological deities and their location properties.

  1. While you are still in the Gremlin shell from the last section, enter the following command:

    GraphOfTheGodsFactory.load(graph)
    

    When the command completes, it returns null:

    gremlin> GraphOfTheGodsFactory.load(graph)
    ==>null
    
  2. With the sample graph loaded, you can issue graph traversal queries. For example, to find all brothers of Jupiter, enter the following query:

    g.V().has('name', 'jupiter').out('brother').values('name')
    

    You can break down this query by looking at the steps that it traverses:

    Traversal step Explanation
    g.V() Start with the collection of vertices.
    has('name', 'jupiter') Find one that has the property name with the value of jupiter.
    out('brother') From there, follow any edges that are labeled brother.
    values('name') For the vertices where those edges lead, get the name property.

    Here's the output of the query:

    gremlin> g.V().has('name', 'jupiter').out('brother').values('name')
    ==>neptune
    ==>pluto
    

To get more familiar with the traversal queries that are possible on this Graph of the Gods dataset, try out other sample queries from the JanusGraph docs.

Cleaning up

To avoid incurring charges to your Google Cloud Platform account for the resources used in this tutorial:

  1. In the GCP Console, go to the Manage resources page.

    Go to the Manage resources page

  2. In the project list, select the project you want to delete and click Delete .
  3. In the dialog, type the project ID, and then click Shut down to delete the project.

What's next

هل كانت هذه الصفحة مفيدة؟ يرجى تقييم أدائنا:

إرسال تعليقات حول...