Monitoring your Anthos infrastructure with Datadog

By: Maxim Brown, Technical Content Editor, Datadog

This tutorial describes how to set up and configure the Datadog cloud monitoring service to monitor your Anthos-managed infrastructure. You can use this tutorial to set up monitoring of Kubernetes-orchestrated services on both Anthos on Google Cloud and in environments where Anthos is deployed on VMWare. Follow this tutorial if you are an administrator who wants to use Datadog to monitor your Anthos infrastructure.

This tutorial assumes the following:

  • You are familiar with Kubernetes and administering a cluster using the kubectl command-line tool.
  • You are an Anthos customer and have a cluster running in Anthos GKE.
  • You are a Datadog customer or are using a free Datadog trial.

Objectives

  • Enable Datadog's Google Cloud integration.
  • Send Cloud Monitoring metrics and Cloud Logging logs from your Anthos GKE cluster nodes into Datadog.
  • Deploy the Datadog Agent to your clusters to collect node-level and container-level information.
  • Configure the Agent to enable container-level log collection.
  • Deploy a sample Redis service to demonstrate the Agent's Autodiscovery feature.

Costs

This tutorial uses the following billable components of Google Cloud:

To generate a cost estimate based on your projected usage, use the pricing calculator. New Google Cloud users might be eligible for a free trial.

Additionally, any usage of Datadog beyond an initial 14-day trial is subject to standard billing.

When you finish this tutorial, you can avoid continued billing by deleting the resources you created. For more information, see Cleaning up.

Before you begin

  1. Make sure that you have an Anthos GKE cluster deployed and registered in Google Cloud Console. The cluster must have outbound internet access. This tutorial assumes that there are three nodes in the cluster, but this is not mandatory. To deploy this Anthos component, follow Anthos documentation.
  2. Make sure that you have the necessary permissions to create Kubernetes resources, including service accounts and cluster roles, on any cluster that you want to monitor. For more information about roles and permissions, see the Identity and Access Management (IAM) documentation.
  3. Set up an active Datadog account if you don't have one already.
  4. In the Cloud Console, go to the project selector page.
    Go to the project selector page
  5. Select the Cloud project that contains your cluster.

Understanding how Datadog monitors GKE

Datadog provides the following complementary ways to monitor a GKE cluster, which are included in this tutorial:

  • Datadog's Google Cloud integration
  • The open source Datadog Agent

Architecture of Datadog integrated with Anthos.

Figure 1. Architectural overview of Anthos and Datadog integration.

The open source Datadog Agent reports cluster-state information, local system metrics, and metrics from the containers and services running on your nodes. You can deploy the Agent to any host in the cloud or on-premises. Datadog collects all information so that you can monitor and visualize it together. The following diagram shows how you can monitor clusters deployed to Anthos clusters on VMware, GKE, and Anthos clusters on AWS.

Deploying Datadog agent to GKE and Anthos clusters on VMware.

Figure 2. Datadog agents deployed on GKE, Anthos clusters on VMware and Anthos clusters on AWS.

Enabling Datadog's Google Cloud integration

Datadog's Google Cloud integration uses a service account to make calls to the Cloud Logging API to collect node-level metrics from your Compute Engine instances.

To use Datadog to monitor multiple projects, for each project repeat the steps in the Enable the APIs, Create a service account, and Connect the service account to Datadog sections.

Enable the APIs

  1. In the Cloud Console, activate Cloud Shell.

    Activate Cloud Shell

    At the bottom of the Cloud Console, a Cloud Shell session starts and displays a command-line prompt. Cloud Shell is a shell environment with the Cloud SDK already installed, including the gcloud command-line tool, and with values already set for your current project. It can take a few seconds for the session to initialize.

  2. In Cloud Shell, enable the Compute Engine and Cloud Monitoring APIs:
      gcloud services enable compute.googleapis.com monitoring.googleapis.com
      

Create a service account

  1. In Cloud Shell, create a Datadog service account:

    gcloud iam service-accounts create datadog-service-account \
        --display-name "Datadog Service Account" \
        export DATADOG_SA="datadog-service-account@${GOOGLE_CLOUD_PROJECT}.iam.gserviceaccount.com"
    
  2. Enable the Datadog service account to collect metrics, tags, events, and user labels by granting the following IAM roles:

    gcloud projects add-iam-policy-binding ${GOOGLE_CLOUD_PROJECT}
        --member serviceAccount:${DATADOG_SA} \
        --role roles/cloudasset.viewer
    
    gcloud projects add-iam-policy-binding ${GOOGLE_CLOUD_PROJECT} \
        --member serviceAccount:${DATADOG_SA} \
        --role roles/compute.viewer
    
    gcloud projects add-iam-policy-binding ${GOOGLE_CLOUD_PROJECT} \
        --member serviceAccount:${DATADOG_SA} \
        --role roles/monitoring.viewer
    
  3. Create and download a service account key. You need the key file to complete the integration with Datadog:

    1. Create a service account key file in your Cloud Shell home directory:

      gcloud iam service-accounts keys create ~/key.json \
        --iam-account ${DATADOG_SA}
      
    2. In Cloud Shell, click More and then select Download File.

    3. In the File path field, enter key.json.

    4. To download the key file, click Download.

Connect the service account to Datadog

  1. In your Datadog account, go to the Google Cloud integration tile.
  2. On the Configuration tab, click Upload Key File, and then select the key.json file that you saved in the previous section. Click Install/Update. Datadog is now integrated with the Cloud project.

    After you complete the integration, Datadog automatically creates a dashboard for Compute Engine.The dashboard provides visualizations of information like disk I/O, CPU utilization, and network traffic.

    Datadog dashboard for Compute Engine.

    Figure 3. Dashboard in Datadog that visualizes Compute Engine information.

Collecting logs with Cloud Logging

Next, you create a Pub/Sub topic with an HTTP-push forwarder to export Google Cloud service logs from Logging to Datadog.

Create and configure a Pub/Sub topic

  1. In Datadog, go to the Datadog API settings page and copy your API key. The key is a 32-character hexadecimal string.

  2. In Cloud Shell, export your API key to an environment variable:

    export DD_API_KEY=datadog-api-key
    

    Replace datadog-api-key with the API key that you copied in the previous step.

  3. Create a Pub/Sub topic to export logs:

    gcloud pubsub topics create export-logs-to-datadog
    
  4. Create a subscription to send logs from the Pub/Sub topic to Datadog:

    gcloud pubsub subscriptions create datadog-logs-subscription \
        --topic=export-logs-to-datadog \
        --push-endpoint=https://gcp-intake.logs.datadoghq.com/v1/input/${DD_API_KEY}/
    

Export logs to the Pub/Sub topic

  1. In Cloud Shell, create a log sink that sends logs data to the Pub/Sub topic:

    gcloud logging sinks create datadog-logs-sink pubsub.googleapis.com/projects/${GOOGLE_CLOUD_PROJECT}/topics/export-logs-to-datadog \
        --log-filter="severity>=WARNING"
    

    The output is similar the following:

    Created [https://logging.googleapis.com/v2/projects/your-project/sinks/datadog-logs-sink].
    Please remember to grant `serviceAccount:logs-sink-service-account` the Pub/Sub Publisher role on the topic.
    

    In this output:

    • your-project: Represents your Cloud project ID.
    • logs-sink-service-account: Represents a new service account for the logs sink. This service account is different from the Datadog service account that you previously created.
  2. Grant the logs-sink-service-account service account an IAM role to publish the Pub/Sub topic:

    gcloud pubsub topics add-iam-policy-binding export-logs-to-datadog \
      --member serviceAccount:logs-sink-service-account \
      --role roles/pubsub.publisher
    

    Within a few seconds, forwarded logs appear in the Datadog Log Explorer.

Deploying the Datadog Agent

In this section, you deploy a containerized version of the Datadog Agent as a DaemonSet to your cluster. The Agent on each node collects information from the node's Kubelet, and any containers running on it, and forwards the information to Datadog. Follow these steps for any cluster that you want to deploy the Agent to.

Configure RBAC permissions for the Datadog Agent

If your environment has role-based access control (RBAC) enabled, configure RBAC permissions for your Datadog Agent service account by creating the appropriate ClusterRole, ServiceAccount, and ClusterRoleBinding files.

  1. In a terminal configured to access your GKE cluster, deploy the manifests to the default namespace:

    kubectl create -f "https://raw.githubusercontent.com/DataDog/datadog-agent/master/Dockerfiles/manifests/rbac/clusterrole.yaml"
    
    kubectl create -f "https://raw.githubusercontent.com/DataDog/datadog-agent/master/Dockerfiles/manifests/rbac/serviceaccount.yaml"
    
    kubectl create -f "https://raw.githubusercontent.com/DataDog/datadog-agent/master/Dockerfiles/manifests/rbac/clusterrolebinding.yaml"
    
  2. To generate a Datadog API key, go to the Datadog API settings page, click Create API Key, and then enter a key name.

  3. In a terminal configured to access your GKE cluster, store your Datadog API key as a Kubernetes Secret. Replace datadog-api-key with the key that you created in the preceding step.

    kubectl create secret generic datadog-secret \
        --from-literal api-key="datadog-api-key"
    

Deploy the Datadog Agent DaemonSet

There are different ways to deploy the Datadog Agent. For example, you can deploy Datadog in Kubernetes using Helm. For this tutorial, you use kubectl.

  1. In a text editor, create the datadog-agent.yaml manifest file and paste in the manifest text available on the Datadog Agent Kubernetes settings page.

  2. In a terminal configured to access your GKE cluster, deploy the manifest:

    kubectl apply -f datadog-agent.yaml
    
  3. Verify that the Agent was deployed correctly:

    kubectl get daemonset
    

    The output looks similar to the following:

    NAME            DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
    datadog-agent   3         3         3       3            3           <none>          15d
    

    The DESIRED and CURRENT Pod counts match the number of nodes running in your cluster. If you don't see the expected output, see Troubleshooting and making changes.

    When the Agent is running, node- and container-level system metrics, and Docker and Kubernetes metrics from your cluster are available in Datadog within minutes. Datadog automatically populates the Dashboards list with a new Kubernetes dashboard.

    Datadog dashboard for Kubernetes

    Figure 4. Datadog's Kubernetes dashboard.

Troubleshooting

  1. If a Pod isn't starting or data isn't coming through, check the status of the Datadog agent pods:

    kubectl get pods --selector=app=datadog-agent
    

    The output looks similar to following:

    NAME                  READY   STATUS    RESTARTS   AGE
    datadog-agent-44kbl   0/1     Pending   0          5m
    datadog-agent-hxbjt   1/1     Running   0          5m
    datadog-agent-l69nx   1/1     Running   0          5m
    

    A status other than Running indicates that the Pod didn't successfully start.

  2. Inspect the failed Pod's logs for errors:

    kubectl logs datadog-agent-pod
    
  3. If you need to update your datadog-agent.yaml file and try again, reapply the changes:

    kubectl apply -f datadog-agent.yaml
    

Add host tags

To filter and monitor parts of your environment, add host tags to the Datadog Agent. For example, if you have both Anthos clusters on VMware and cloud-based GKE clusters, you can assign tags to the nodes in each cluster to identify them as such.

  1. Add space-separated key-value pairs or simple tags to your Datadog Agent DaemonSet manifest. The following example includes the tags env:anthos and role:dev.

    [...]
      env:
         - name: DD_TAGS
           value: '{"env:anthos role:dev"}'
    
    [...]
    
  2. In a terminal configured to access your GKE cluster, apply the changes:

    kubectl apply -f datadog-agent.yaml
    
  3. Go to the Datadog Host Map, and search for env:anthos. The view is filtered to matching hosts.

    Hostmap to visualize Anthos.

    Figure 5. Visualize your Anthos infrastructure in the Host Map.

Enable log collection

In this step, you enable log collection to send logs from your containers to Datadog.

  1. In a text editor, edit the datadog-agent.yaml file to add the following environment variables to your Datadog Agent DaemonSet manifest:

    [...]
    
      env:
        - name: DD_LOGS_ENABLED
          value: "true"
        - name: DD_LOGS_CONFIG_CONTAINER_COLLECT_ALL
          value: "true"
        - name: DD_AC_EXCLUDE
          value: "name:datadog-agent"
    [...]
    

    Optional: To include Datadog Agent logs, omit the DD_AC_EXCLUDE parameter.

  2. In the same file, mount the pointdir volume:

      [...]
        volumeMounts:
          - name: pointdir
            mountPath: /opt/datadog-agent/run
      [...]
      volumes:
        - hostPath:
            path: /opt/datadog-agent/run
          name: pointdir
      [...]
    
  3. In a terminal configured to access your GKE cluster, apply the changes:

    kubectl apply -f datadog-agent.yaml
    

    Logs from your containers and the services running on them appear in the Datadog Log Explorer. These logs complement the logs that you collected with Logging in a previous section.

Using Autodiscovery

The Datadog Agent that you deployed in the previous section has Autodiscovery enabled by using the KUBERNETES=true environment variable. When Autodiscovery is enabled, the Agent container on each node determines what other containers on that node are running, and enables the appropriate Datadog Agent checks to start monitoring them. Autodiscovery allows Datadog to monitor your containerized services while the Pods running them are created and destroyed.

By default, the Agent includes auto-configuration for many common integrations. The Agent automatically discovers and starts monitoring any containers running these services, including Redis.

Next, you deploy a sample Redis service with the Autodiscovery feature.

Deploy a sample Redis service

  1. In a text editor, create a redis.yamlmanifest file:

    cat <<EOF >redis.yaml
    
    apiVersion: apps/v1beta1
    kind: Deployment
    metadata:
      name: redis
    spec:
      replicas: 2
      template:
        metadata:
          labels:
            role: redis
        spec:
          containers:
          - name: redis
            image: redis
            imagePullPolicy: Always
            ports:
            - name: redis
              containerPort: 6379
    ---
    apiVersion: v1
    kind: Service
    metadata:
      name: redis
      labels:
        role: redis
    spec:
      ports:
      - port: 6379
        targetPort: 6379
      selector:
        role: redis
    EOF
    
  2. In a terminal configured to access your GKE cluster, deploy the service:

    kubectl apply -f redis.yaml
    

Monitor your Redis service in Datadog

The Datadog Agent autodiscovers the containers running the Redis service. The Agent starts forwarding metrics and logs, because you enabled log collection in a previous section. Within a few minutes, a Redis dashboard is added to the Dashboards list.

  1. To view the Redis dashboard, go to the Dashboards list.

    Datadog dashboard for Redis.

    Figure 6. Datadog dashboard with Redis metrics.

    Now that log collection is enabled, you can access Redis logs from your containers in the Datadog Log Explorer.

  2. To access Redis logs, use the filter Source:redis.

    Datadog Logs Explorer for Redis.

    Figure 7. Datadog Logs Explorer with Redis logs.

Cleaning up

After you complete this tutorial, follow these steps to disable Datadog's Google Cloud integration and remove any created resources:

  1. In your Datadog account, go to the Google Cloud integration tile and select Uninstall Integration.
  2. In the directory that contains the datadog-agent.yaml manifest, run the following command:

    kubectl delete -f datadog-agent.yaml
    
  3. Remove other resources created in this tutorial:

    kubectl delete -f "https://raw.githubusercontent.com/DataDog/datadog-agent/master/Dockerfiles/manifests/rbac/clusterrole.yaml"
    
    kubectl delete -f "https://raw.githubusercontent.com/DataDog/datadog-agent/master/Dockerfiles/manifests/rbac/serviceaccount.yaml"
    
    kubectl delete -f "https://raw.githubusercontent.com/DataDog/datadog-agent/master/Dockerfiles/manifests/rbac/clusterrolebinding.yaml"
    
    kubectl delete secret datadog-secret
    
  4. In the directory that contains your redis.yaml manifest, run the following command:

    kubectl delete -f redis.yaml
    

What's next