Automating Canary Analysis on Google Kubernetes Engine with Spinnaker

This tutorial guides you through configuring the automated canary analysis feature of Spinnaker on Google Kubernetes Engine (GKE).

Introduction

Spinnaker is an open source, continuous delivery system led by Netflix and Google to manage the deployment of apps on different computing platforms, including App Engine, GKE, Compute Engine, AWS, and Azure. Using Spinnaker, you can implement advanced deployment methods, including canary deployments.

In a canary deployment, you expose a new version of your app to a small portion of your production traffic and analyze its behavior before going ahead with the full deployment. This lets you mitigate risks before deploying a new version to all of your users. To use canary deployments, you must accurately compare the behavior of the old and new versions of your app. The differences can be subtle and might take some time to appear. You might also have a lot of different metrics to examine.

To solve those problems, Spinnaker has an automated canary analysis feature: it reads the metrics of both versions from your monitoring system and runs a statistical analysis to automate the comparison. This tutorial shows you how to do an automated canary analysis on an app deployed on GKE and monitored by Stackdriver.

Spinnaker is an advanced app deployment and management platform for organizations with complex deployment scenarios, often with a dedicated release engineering function. You can run this tutorial without prior Spinnaker experience. However, implementing automated canary analysis in production is generally done by teams that already have Spinnaker experience, a strong monitoring system, and that know how to determine if a release is safe.

About this tutorial

The app in this tutorial is a simple "Hello World" whose error rate is configured with an environment variable. A pre-built Docker image for this app is provided. As illustrated in the following image, the app exposes metrics in the Prometheus format, an open source monitoring system popular in the Kubernetes community, and compatible with Stackdriver.

Architecture of app

Objectives

  • Create a GKE cluster.
  • Install Spinnaker.
  • Deploy an app to GKE without a canary deployment.
  • Configure and run a canary deployment of the app.
  • Configure the automated canary analysis.
  • Test the automated canary analysis.

Costs

Before you begin

  1. Select or create a GCP project.

    GO TO THE MANAGE RESOURCES PAGE

  2. Enable billing for your project.

    ENABLE BILLING

  3. Create a Stackdriver account.

    GO TO THE Stackdriver DOCUMENTATION

When you finish this tutorial, you can avoid continued billing by deleting the resources you created. See Cleaning up for more detail.

Setting up your environment

In this section, you configure the infrastructure required to complete the tutorial. Run all the terminal commands in this tutorial from Cloud Shell.

  1. Open Cloud Shell.

    GO TO CLOUD SHELL

  2. Export your project ID in as a variable.

    export GOOGLE_CLOUD_PROJECT=[PROJECT_ID]
    

    where:

    • [PROJECT_ID] represents the ID of the project you are using.
  3. Create a GKE cluster.

    gcloud config set project $GOOGLE_CLOUD_PROJECT
    gcloud config set compute/zone us-central1-f
    gcloud services enable container.googleapis.com
    gcloud beta container clusters create kayenta-tutorial \
        --machine-type=n1-standard-2 --cluster-version=1.10 \
        --enable-stackdriver-kubernetes \
        --scopes=gke-default,compute-ro
    gcloud container clusters get-credentials kayenta-tutorial
    

  4. Install the Stackdriver-Prometheus integration plugin.

    kubectl apply --as=admin --as-group=system:masters -f \
        https://storage.googleapis.com/stackdriver-prometheus-documentation/rbac-setup.yml
    curl -sS "https://storage.googleapis.com/stackdriver-prometheus-documentation/prometheus-service.yml" | \
        sed "s/_stackdriver_project_id:.*/_stackdriver_project_id: $GOOGLE_CLOUD_PROJECT/" | \
        sed "s/_kubernetes_cluster_name:.*/_kubernetes_cluster_name: kayenta-tutorial/" | \
        sed "s/_kubernetes_location:.*/_kubernetes_location: us-central1-f/" | \
        kubectl apply -f -
    

  5. Deploy Spinnaker in your new GKE cluster.

    kubectl apply -f https://spinnaker.io/downloads/kubernetes/quick-install.yml
    

  6. The deployment takes a few minutes to complete. To check the progress, run the command watch kubectl -n spinnaker get pods. To stop the watch command, press Ctrl+C. Once the deployment is complete, this command outputs all the pods as Ready 1/1.

    NAME                                READY  STATUS    RESTARTS   AGE
    minio-deployment-7c665c4b57-jx7px   1/1    Running   0          5m
    spin-clouddriver-789c6fff77-rjtc6   1/1    Running   0          4m
    spin-deck-68b5968f7f-trmkn          1/1    Running   0          4m
    spin-echo-57dbff9fb8-rq5qc          1/1    Running   0          4m
    spin-front50-67965475b8-l24db       1/1    Running   0          4m
    spin-gate-6d8bbf8c45-m9pzn          1/1    Running   0          4m
    spin-halyard-59fd54bd69-xns49       1/1    Running   0          5m
    spin-kayenta-99b97b85f-4gvsv        1/1    Running   0          4m
    spin-orca-5748974888-cph9g          1/1    Running   0          4m
    spin-redis-6d49c9c5b9-q2hzm         1/1    Running   0          4m
    spin-rosco-6b4ddbcb94-mjrht         1/1    Running   0          4m
    

  7. To access Spinnaker, forward a local port to the deck component of Spinnaker.

    DECK_POD=$(kubectl -n spinnaker get pods -l \
        cluster=spin-deck,app=spin \
        -o=jsonpath='{.items[0].metadata.name}')
    kubectl -n spinnaker port-forward $DECK_POD 8080:9000 >/dev/null &
    

  8. In Cloud Shell, click the Web Preview icon and select Preview on port 8080.

    Web preview icon for port 8080.

Deploying an app with Spinnaker

In this section, you configure Spinnaker to deploy an app in the GKE cluster.

Create a Spinnaker app

Before you deploy, you create the Spinaker app.

  1. In Spinnaker, click Actions > Create Application.

    Create application drop-down menu

  2. In the New Application dialog, enter the following values:

    • Name: sampleapp
    • Owner Email: [example@example.com]

  3. Click Create.

You are now in the sampleapp of Spinnaker. It isn't configured yet, so most of the tabs are empty.

Create and run a deployment pipeline

In this section, you first deploy the app with a simple Spinnaker pipeline that takes a successRate parameter to create a GKE Deployment with four pods. Those pods throw errors randomly at a rate corresponding to the successRate parameter. In this tutorial, they throw 500 errors at a rate of 100 - successRate.

  1. In Cloud Shell, create the pipeline with the provided JSON file. The following command posts the JSON definition of the pipeline directly to the Spinnaker API.

    wget https://raw.githubusercontent.com/spinnaker/spinnaker/master/solutions/kayenta/pipelines/simple-deploy.json
    curl -d@simple-deploy.json -X POST \
        -H "Content-Type: application/json" -H "Accept: /" \
        http://localhost:8080/gate/pipelines
    

  2. In the Pipelines section of Spinnaker, a pipeline called Simple deploy appears. If you don't see it, reload the page. Click Start Manual Execution.

    Start manual execution of simple deploy pipeline

  3. In the Confirm Execution window, select a Success Rate of 70, and then click Run. After a few seconds, the pipeline successfully deploys the configuration of the app and four pods.

  4. In Cloud Shell, create a pod that makes requests to your new app until the end of the tutorial.

    kubectl -n default run injector --image=alpine -- \
        /bin/sh -c "apk add --no-cache --yes curl; \
        while true; do curl -sS --max-time 3 \
        http://sampleapp:8080/; done"
    

Check the logs of the injector

  1. To see the behavior of the app, check the logs of the injector.

    kubectl -n default logs -f \
        $(kubectl -n default get pods -l run=injector \
        -o=jsonpath='{.items[0].metadata.name}')
    

  2. A high number of Internal Server Error messages appear in the logs. To stop following the logs of the injector, press Ctrl+C .

Check the health of your app

Now that your app is deployed and serves traffic, see if it's behaving correctly. Of course, in this tutorial, you already know that it isn't because you deployed the app with only a 70% success rate.

The app exposes a /metrics endpoint with metrics in the Prometheus format that are ingested by Stackdriver. In this section, you visualize those metrics in Stackdriver.

  1. In Stackdriver, go to Metrics Explorer.

    METRICS EXPLORER

  2. In the Find resource type and metric field, enter the following:

    external.googleapis.com/prometheus/requests
    

  3. To refine the graph, in the Group By field, enter http_code. In the following graph, the rates of HTTP requests answered by the app are grouped by HTTP status code:

    Graph of HTTP requests answered by the app

As you can see in the graph, the app currently has an unacceptable error rate—around 30%, as expected. The rest of the tutorial guides you through the setup of a canary deployment pipeline and an automatic analysis to prevent future deployments of an app with such a high error rate.

Creating a canary deployment

In this section, you create a canary deployment pipeline, without automated analysis, to test the new version of the app before deploying it fully to production. In the following image, different stages of this pipeline are outlined:

Illustration of the stages of a canary deployment pipeline

  • Step 0: Like in the Simple Deploy pipeline, the pipeline takes a Success Rate parameter as input. This new pipeline uses this parameter to simulate different success rates. This is the Configuration of the pipeline.
  • Step 1: The Find Baseline Version stage retrieves the current version of the app running in production from the latest execution of the Simple Deploy pipeline. In this tutorial, it retrieves the success rate of the currently deployed app.
  • In parallel with the Find Baseline Version stage, the Deploy Canary Config stage deploys the new success rate configuration for the canary version of the app.
  • Step 2: The Deploy Canary and Deploy Baseline stages deploy the two versions for comparison, the new canary version and a baseline version. The canary version uses the configuration created in Deploy Canary Config whereas the baseline version uses the configuration used by the production version.

  • Step 3: The Manual Judgment stage stops the pipeline until you continue. During this stage, you can check if the canary version behaves correctly.

  • Step 4: Once you continue past the Manual Judgment stage, both the Delete Canary and Delete Baseline stages clean up the infrastructure.
  • In parallel with the cleanup, the Deploy to Production stage is launched and triggers the Simple Deploy pipeline with the same Success Rate parameter that you gave initially. The same version of the app you tested in a canary is deployed in production.
  • The Deploy to Production stage is triggered only if you chose to continue during the Manual Judgment stage.
  • Step 5: Finally, the Successful Deployment stage validates that the whole pipeline is successful. It checks that you gave the go-ahead in the Manual Judgment stage and only executes if the Deploy to Production, Delete Canary, and Delete Baseline stages executed successfully.

Now, you can create and run the Canary Deploy pipeline.

  1. To create the Canary Deploy pipeline, run the following command to fetch the ID of the Simple deploy pipeline and inject it into the Canary Deploy pipeline:

    wget https://raw.githubusercontent.com/spinnaker/spinnaker/master/solutions/kayenta/pipelines/canary-deploy.json
    export PIPELINE_ID=$(curl \
        localhost:8080/gate/applications/sampleapp/pipelineConfigs/Simple%20deploy \
        | jq -r '.id')
    jq '(.stages[] | select(.refId == "9") | .pipeline) |= env.PIPELINE_ID | (.stages[] | select(.refId == "8") | .pipeline) |= env.PIPELINE_ID' canary-deploy.json | \
        curl -d@- -X POST \
        -H "Content-Type: application/json" -H "Accept: /" \
        http://localhost:8080/gate/pipelines
    

  2. If you don't see the Canary Deploy pipeline in Spinnaker, reload the sampleapp page, and click Pipelines.

  3. To launch the Canary Deploy pipeline:

    1. Click Start Manual Execution.
    2. Select a Success Rate of 80.
    3. Click Run.
  4. When the pipeline reaches the Manual Judgment stage, don't click Continue yet because you need to compare the canary version with the baseline version.

    Manual judgement stage of the canary pipeline

  5. In Cloud Shell, run the kubectl -n default get pods command to see the new pods labeled canary and baseline:

    NAME                                READY STATUS  RESTARTS  AGE
    injector-66bd655ffd-9ntwx           1/1   Running 0         30m
    sampleapp-5cdf8f55dd-995rz          1/1   Running 0         28m
    sampleapp-5cdf8f55dd-dqq8n          1/1   Running 0         28m
    sampleapp-5cdf8f55dd-ntq57          1/1   Running 0         28m
    sampleapp-5cdf8f55dd-rlpzp          1/1   Running 0         28m
    sampleapp-baseline-567b8d6849-gsgqr 1/1   Running 0          4m
    sampleapp-canary-54b9759dd6-gmjhc   1/1   Running 0          4m
    

  6. In Stackdriver, go to Metrics Explorer.

    METRICS EXPLORER

  7. If there are any metrics already configured in Metrics Explorer, remove all existing configuration from the form.

  8. Select the canary data as your first metric, specifying the following parameters:

    1. Metric: external.googleapis.com/prometheus/requests
    2. Filters:

      • http_code equals 500
      • pod_name equals sampleapp-canary-*
  9. To select the baseline data as your second metric, click + Add Metric, and complete the following fields:

    1. Metric: external.googleapis.com/prometheus/requests
    2. Filters:

      • http_code equals 500
      • pod_name equals sampleapp-baseline-*
  10. Compare the canary version (purple in the following graph) with the baseline version (blue in the following graph). Colors might differ in your graph. In this tutorial, the canary version has a lower error rate than the baseline version. Therefore, it is safe to fully deploy the canary version to production. If the canary version didn't have a lower error rate, you might want to stop the deployment at this stage and make some corrections to your app.

    Graph that compares the canary error rate with the baseline version

  11. In Spinnaker, in the Manual Judgement dialog, click Continue.

    Manual judgement stage of the canary pipeline

  12. When the deployment is finished, in Stackdriver, go back to Metrics Explorer.

    METRICS EXPLORER

  13. If there are any metrics already configured in Metrics Explorer, remove all existing configuration from the form.

  14. In the Find resource type and metric field, enter the following:

    external.googleapis.com/prometheus/requests
    

  15. In the Group By field, enter http_code. In the following graph, the rate of HTTP requests answered by the app is split by HTTP status code:

    Graph that compares the rate of HTTP requests

    This graph shows the rate of HTTP codes, 200 and 500, for all pods: production, baseline and canary. Because the canary version had a lower error rate, you deployed it in production. After a short period of time during the deployment, where the total number of requests is slightly lower, you can see that the overall error rate is lowered: the canary version has correctly been deployed in production.

Automating canary analysis

A canary deployment is useful, but in its current implementation, it's a manual process. You have to manually check that the canary behaves as you want before doing a full deployment, and the difference between canary and baseline isn't always clear.

Automating the canary analysis is a good idea: you don't have to do it yourself, and an automated statistical analysis is better suited than humans to detect problems in a set of metrics. In this section, the Manual Judgement stage is replaced by an automated canary analysis.

Enable canary support

First, in Spinnaker you configure the automated canary analysis feature, called Kayenta. To configure Kayenta, use Halyard, the same tool used to configure and deploy Spinnaker.

  1. In Cloud Shell, get your project ID.

    echo $GOOGLE_CLOUD_PROJECT
    

  2. Get a shell in the Halyard pod.

    export HALYARD_POD=$(kubectl -n spinnaker get pods -l \
        stack=halyard,app=spin \
        -o=jsonpath='{.items[0].metadata.name}')
    kubectl -n spinnaker exec -it $HALYARD_POD -- bash
    

  3. Configure Kayenta to use Stackdriver as backend.

    hal config canary google enable
    hal config canary google account add kayenta-tutorial --project [PROJECT_ID]
    hal config canary google edit --stackdriver-enabled=true
    

    where:

    • [PROJECT_ID] represents the project ID you retrieved.
  4. Apply the new configuration and exit the Halyard pod.

    hal deploy apply
    exit
    

  5. The deployment takes a few minutes to complete. To check the progress, run the commands watch kubectl -n spinnaker get pods. When the deployment is complete, this command outputs all the pods as being Ready 1/1. To stop the watch command press Ctrl+C.

    NAME                               READY  STATUS   RESTARTS AGE
    minio-deployment-7c665c4b57-prl6d  1/1    Running  0        1h
    spin-clouddriver-6c4f954667-8769c  1/1    Running  0        1h
    spin-deck-7d44499f9b-hkqz4         1/1    Running  0        1h
    spin-echo-6cf4bbbbfc-vxzlr         1/1    Running  0        1h
    spin-front50-7666c894c6-fm7sz      1/1    Running  0        1h
    spin-gate-76f789696d-vsn98         1/1    Running  0        1h
    spin-halyard-59fd54bd69-vb99h      1/1    Running  0        1h
    spin-kayenta-84f6b9b697-5krhh      1/1    Running  0        1m
    spin-orca-78f5c74c6f-srl4f         1/1    Running  0        1h
    spin-redis-6d49c9c5b9-gddgv        1/1    Running  0        1h
    spin-rosco-699cb484f7-grthh        1/1    Running  0        1h
    

Configure the automatic canary analysis feature

Now that Kayenta is enabled, configure it for sampleapp.

  1. In Spinnaker, click Config.

  2. In the Features section, select Canary, and then click Save Changes.

    Screenshot of features for the pipeline

Create a canary configuration

In Spinnaker, an automated canary analysis runs a statistical test on different metrics and outputs a score. This score can range from 0 to 100 and represents the number of metrics that pass or fail the comparison between the baseline and the canary. You can influence the score, by placing metrics in different groups, with different weights for each group. Depending on the score of the analysis, you might want to go ahead with the deployment or not. If you use a single metric—like in this tutorial—the score can only be 0 (fail) or 100 (pass).

An app can have several canary configurations that can be shared across several apps. A canary configuration has two main elements:

  • A set of metrics to analyze (possibly in different groups).
  • Marginal and pass thresholds for the score.

In a deployment pipeline, a canary configuration is used during the Canary Analysis stage. This stage can include several canary runs. If the score of any run is below the marginal threshold, the stage is stopped and the other runs are not executed. The last run's score needs to be above the pass threshold for the whole stage to be considered successful.

To create a canary configuration, follow these steps:

  1. Now that canary is enabled, reload Spinnaker. The Pipelines section is replaced with Delivery. In the Delivery section, go to Canary Configs.

  2. Click Add Configuration.

  3. For Configuration Name, enter kayenta-test.

  4. In the Filter Templates section, click Add Template.

  5. In the Add Template dialog, add the following values, and then click OK:

    • Name: http_code
    • Template: metric.labels.http_code = "500" AND resource.label.pod_name = starts_with("${scope}")

    The scope variable is populated at runtime with the name of the GKE Deployment that Kayenta should be checking metrics for. For the baseline, it is sampleapp-baseline, and for the canary, it is sampleapp-canary.

  6. In the Metrics section, click Add Metric.

  7. In the Add Metric dialog, enter the following values, and then click OK:

    • Name: error_rate
    • Fail on: increase
    • Filter Template: http_code
    • Metric type: external.googleapis.com/prometheus/requests
  8. In the Scoring section, select the following values:

    • Marginal: 75
    • Pass: 95
    • Group 1: 100
  9. Click Save Changes.

Add a canary analysis stage to the pipeline

Now that you have a canary configuration, modify your existing deployment pipeline to replace the Manual Judgment stage with a Canary Analysis stage that uses this configuration.

  1. Go to Delivery > Pipelines, and for the Canary Deploy pipeline, click Configure.

    Screenshot of configure button for the canary deploy

  2. Click Add Stage.

  3. For Type, select Canary Analysis.

  4. In the Depends On section, modify your new stage to depend on the following selections:

    • Deploy Canary
    • Deploy Baseline
  5. In the Extended Params section, click Add Field and add a parameter with a key of perSeriesAligner and a value of ALIGN_RATE.

  6. Fill in the Canary Analysis Configuration section with the following values:

    Parameter name Value Definition
    Config Name kayenta-test The name of the canary configuration you created earlier.
    Delay 0 The time we give to the app to warm up before doing the analysis.
    Interval 5 The time window Kayenta should use to run a single statistical analysis.
    Baseline sampleapp-baseline The GKE Deployment Kayenta should use as baseline.
    Baseline Location default The GKE namespace in which the baseline lives.
    Canary sampleapp-canary The GKE Deployment Kayenta should use as canary.
    Canary Location default The GKE namespace in which the canary lives.
    Lifetime 0 hours 5 minutes How long the canary analysis should last.
    Resource Type k8s_container The type of resources you are running the canary analysis against. This is used to query the Stackdriver Debugger API.
    Metrics Account kayenta-tutorial The account used by Kayenta to query for metrics. Here, it's the Google account you configured earlier to give Spinnaker access to {{stackdriver_name_short}.
    Storage Account kayenta-minio The account used by Kayenta to store the files it needs (like canary reports).

  7. In the Execution Options section, select Ignore the failure. You ignore the failure so you can destroy the baseline and the canary even if the canary analysis failed. Later in the tutorial, you modify the stages to take a potential canary failure into account.

  8. In the pipeline's schema, click Deploy to Production.

    Screenshot of Deploy to Production button for the pipeline

  9. Change the Depends On section, to the following parameters:

    1. Add Canary Analysis.
    2. Remove Manual Judgment.
  10. To ensure that you deploy to production only if the canary analysis succeeds, change the Conditional on Expression parameter.

    ${ #stage('Canary Analysis')['status'].toString() == 'SUCCEEDED'}
    

  11. In the pipeline's schema, click Delete Canary, and change the Depends On section to the following parameters:

    1. Add Canary Analysis.
    2. Remove Manual Judgment.
  12. In the pipeline's schema, click Delete Baseline, and change the Depends On section.

    1. Add Canary Analysis.
    2. Remove Manual Judgment.
  13. To ensure that the whole pipeline fails if the canary analysis fails, in the pipeline's schema, click Successful deployment, and then for the existing precondition click the Edit icon.

    Edit the existing precondition of the successful deployment

    1. Change the Expression to the following:

      ${ #stage('Canary Analysis')['status'].toString() == 'SUCCEEDED'}
      

    2. Click Update.

  14. Finish replacing the Manual Judgement stage with the newly created Canary Analysis stage.

    1. In the pipeline's schema, click Manual Judgment.
    2. Click Remove stage.
  15. Click Save Changes. Your pipeline now looks like the following image: Visualization of the canary analysis pipeline

Test your new pipeline

Now that the automated canary analysis is configured, test the pipeline to ensure it behaves as expected.

  1. Go to Delivery > Pipelines, and for the Canary Deploy pipeline, or Automated Canary Deploy if you used the CLI, click Start Manual Execution.

  2. Select a Success Rate of 60 and then click Run.

  3. To check the current progress of the canary analysis, click Canary Analysis, and then click Task Status. After a few minutes, the Canary Analysis stage fails, because the current success rate in production is 80. When the Canary Analysis stage fails, go to the report for this canary analysis.

    1. Click Canary Analysis.
    2. Click Canary Summary.
    3. Click the Report icon. On the report page, the error rate is higher for the canary version than it is for the baseline version.

      Report icon for the canary analysis summary

  4. Repeat the steps in this section, but select a Success Rate of 90 for a successful canary analysis.

Cleaning up

To avoid incurring charges to your Google Cloud Platform account for the resources used in this tutorial:

  1. In the GCP Console, go to the Projects page.

    Go to the Projects page

  2. In the project list, select the project you want to delete and click Delete project. After selecting the checkbox next to the project name, click
      Delete project
  3. In the dialog, type the project ID, and then click Shut down to delete the project.

Delete the resources

If you want to keep the GCP project you used in this tutorial, delete the individual resources:

  1. Uninstall Spinnaker.

    kubectl delete -f https://spinnaker.io/downloads/kubernetes/quick-install.yml
    

  2. Delete the GKE cluster.

    gcloud container clusters delete kayenta-tutorial
    

  3. When prompted for confirmation, type Y.

What's next

Was this page helpful? Let us know how we did:

Send feedback about...