Customizing Cloud Logging logs for Google Kubernetes Engine with Fluentd

Last reviewed 2022-10-03 UTC

This tutorial describes how to customize Fluentd logging for a Google Kubernetes Engine cluster. You'll learn how to host your own configurable Fluentd daemonset to send logs to Cloud Logging, instead of selecting the cloud logging option when creating the Google Kubernetes Engine (GKE) cluster, which does not allow configuration of the Fluentd daemon.

Objectives

  • Deploy your own Fluentd daemonset on a Google Kubernetes Engine cluster, configured to log data to Cloud Logging. We assume that you are already familiar with Kubernetes.
  • Customize GKE logging to remove sensitive data from the Cloud Logging logs.
  • Customize GKE logging to add node-level events to the Cloud Logging logs.

Costs

This tutorial uses billable components of Google Cloud, including:

The Pricing Calculator estimates the cost of this environment at around $1.14 for 8 hours.

Before you begin

  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  2. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  3. Make sure that billing is enabled for your Google Cloud project.

  4. Enable the Google Kubernetes Engine, Compute Engine APIs.

    Enable the APIs

  5. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  6. Make sure that billing is enabled for your Google Cloud project.

  7. Enable the Google Kubernetes Engine, Compute Engine APIs.

    Enable the APIs

Initializing common variables

You must define several variables that control where elements of the infrastructure are deployed.

  1. Using a text editor, edit the following script, substituting your project ID for [YOUR_PROJECT_ID]. The script sets the region to us-east-1. If you make any changes to the script, make sure that the zone values reference the region you specify.

    export region=us-east1
    export zone=${region}-b
    export project_id=[YOUR_PROJECT_ID]
    
  2. Go to Cloud Shell.

    Open Cloud Shell

  3. Copy the script into your Cloud Shell window and run it.

  4. Run the following commands to set the default zone and project ID so you don't have to specify these values in every subsequent command:

    gcloud config set compute/zone ${zone}
    gcloud config set project ${project_id}
    

Creating the GKE cluster

  1. In Cloud Shell, clone the sample repository:

    git clone https://github.com/GoogleCloudPlatform/kubernetes-engine-customize-fluentd
    

    The sample repository includes the Kubernetes manifests for the Fluentd daemonset and a test logging program that you deploy later in this tutorial.

  2. Change your working directory to the cloned repository:

    cd kubernetes-engine-customize-fluentd
    
  3. Create the GKE cluster with system logging and monitoring only:

    gcloud container clusters create gke-with-custom-fluentd \
       --zone us-east1-b \
       --logging=SYSTEM \
       --tags=gke-cluster-with-customized-fluentd \
       --scopes=logging-write,storage-rw
    

Deploying the test logger application

By default, the sample application that you deploy continuously emits random logging statements. The Docker container is built from the source code in the test-logger subdirectory.

  1. In Cloud Shell, build the test-logger container image:

    docker build -t test-logger test-logger
    
  2. Tag the container before pushing to the registry:

    docker tag test-logger gcr.io/${project_id}/test-logger
    
  3. Push the container image:

    docker push gcr.io/${project_id}/test-logger
    
  4. Update the deployment file:

    envsubst < kubernetes/test-logger.yaml > kubernetes/test-logger-deploy.yaml
    
  5. Deploy the test-logger application to the GKE cluster:

    kubectl apply -f kubernetes/test-logger-deploy.yaml
    
  6. View the status of the test-logger Pods:

    kubectl get pods
    
  7. Repeat this command until the output looks like the following, with all three test-logger Pods running:

    Command output showing three pods running

Deploying the Fluentd daemonset to your cluster

Next you configure and deploy your Fluentd daemonset.

  1. In Cloud Shell, deploy the Fluentd configuration:

    kubectl apply -f kubernetes/fluentd-configmap.yaml
    
  2. Deploy the Fluentd daemonset:

    kubectl apply -f kubernetes/fluentd-daemonset.yaml
    
  3. Check that the Fluentd Pods have started:

    kubectl get pods --namespace=kube-system
    

    If they're running, you see output like the following:

    Command output showing three pods running

  4. Verify that you're seeing logs in Logging. In the console, on the left-hand side, select Logging > Logs Explorer and then select Kubernetes Container as a resource type in the Resource list.

  5. Click Run Query.

  6. In the Logs field explorer, select test-logger for CONTAINER_NAME:

    Logging listing showing unfiltered data

Filtering information from the logfile

The next step is to specify that Fluentd should filter certain data so that it is not logged. For this tutorial, you filter out the Social Security numbers, credit card numbers, and email addresses. To make this update, you change the daemonset to use a different ConfigMap that contains these filters. You use Kubernetes rolling updates feature and preserve the old version of the ConfigMap.

  1. Open the kubernetes/fluentd-configmap.yaml file in an editor.

  2. Uncomment the lines between and not including the lines ### sample log scrubbing filters and ### end sample log scrubbing filters:

    ############################################################################################################
    #  ### sample log scrubbing filters
    #  #replace social security numbers
    # <filter reform.**>
    #   @type record_transformer
    #   enable_ruby true
    #   <record>
    #     log ${record["log"].gsub(/[0-9]{3}-*[0-9]{2}-*[0-9]{4}/,"xxx-xx-xxxx")}
    #   </record>
    # </filter>
    # # replace credit card numbers that appear in the logs
    # <filter reform.**>
    #   @type record_transformer
    #   enable_ruby true
    #   <record>
    #      log ${record["log"].gsub(/[0-9]{4} *[0-9]{4} *[0-9]{4} *[0-9]{4}/,"xxxx xxxx xxxx xxxx")}
    #   </record>
    # </filter>
    # # replace email addresses that appear in the logs
    # <filter reform.**>
    #   @type record_transformer
    #   enable_ruby true
    #   <record>
    #     log ${record["log"].gsub(/[\w+\-]+@[a-z\d\-]+(\.[a-z\d\-]+)*\.[a-z]+/i,"user@email.tld")}
    #   </record>
    # </filter>
    # ### end sample log scrubbing filters
    #############################################################################################################
  3. Change the name of the ConfigMap from fluentd-gcp-config to fluentd-gcp-config-filtered by editing the metadata.name field:

    name: fluentd-gcp-config
    namespace: kube-system
    labels:
      k8s-app: fluentd-gcp-custom
  4. Save and close the file.

Updating the Fluentd daemonset to use the new configuration

Now you change kubernetes/fluentd-daemonset.yaml to mount the ConfigMap fluentd-gcp-config-filtered instead of fluentd-gcp-config.

  1. Open the kubernetes/fluentd-daemonset.yaml file in an editor.

  2. Change the name of the ConfigMap from fluentd-gcp-config to fluentd-gcp-config-filtered by editing the configMap.name field:

    - configMap:
        defaultMode: 420
        name: fluentd-gcp-config
      name: config-volume
  3. Deploy the new version of the ConfigMap to your cluster:

    kubectl apply -f kubernetes/fluentd-configmap.yaml
    
  4. Roll out the new version of the daemonset:

    kubectl apply -f kubernetes/fluentd-daemonset.yaml
  5. Roll out the update and wait for it to complete:

    kubectl rollout status ds/fluentd-gcp --namespace=kube-system
    

    Command output showing 'Waiting' messages for 3 pods, then success

  6. When the rollout is complete, refresh the Logging logs and make sure that the Social Security number, credit card number, and email address data has been filtered out.

    Logging listing showing the same data but filtered

Logging node-level events

If you want events that happen on your GKE nodes to show up in Logging as well, add the following lines to your ConfigMap and follow the instructions described in the last section:

<source>
  @type systemd
  filters [{ "SYSLOG_IDENTIFIER": "sshd" }]
  pos_file /var/log/journal/gcp-journald-ssh.pos
  read_from_head true
  tag sshd
</source>

<source>
  @type systemd
  filters [{ "SYSLOG_IDENTIFIER": "sudo" }]
  pos_file /var/log/journal/gcp-journald-sudo.pos
  read_from_head true
  tag sudo
</source>

Clean up

After you've finished the tutorial, you can clean up the resources you created on Google Cloud so you won't be billed for them in the future.

Deleting the project

The easiest way to eliminate billing is to delete the project that you created for the tutorial.

To delete the project:

  1. In the Google Cloud console, go to the Manage resources page.

    Go to Manage resources

  2. In the project list, select the project that you want to delete, and then click Delete.
  3. In the dialog, type the project ID, and then click Shut down to delete the project.

Deleting the GKE cluster

If you don't want to delete the whole project, run the following command to delete the GKE cluster:

gcloud container clusters delete gke-with-custom-fluentd --zone us-east1-b

What's next