Automating Canary Analysis on Google Kubernetes Engine with Spinnaker

Stay organized with collections Save and categorize content based on your preferences.
This tutorial guides you through configuring the automated canary analysis feature of Spinnaker on Google Kubernetes Engine (GKE).


Spinnaker is an open source, continuous delivery system led by Netflix and Google to manage the deployment of apps on different computing platforms, including App Engine, GKE, Compute Engine, AWS, and Azure. Using Spinnaker, you can implement advanced deployment methods, including canary deployments.

In a canary deployment, you expose a new version of your app to a small portion of your production traffic and analyze its behavior before going ahead with the full deployment. This lets you mitigate risks before deploying a new version to all of your users. To use canary deployments, you must accurately compare the behavior of the old and new versions of your app. The differences can be subtle and might take some time to appear. You might also have a lot of different metrics to examine. Read more about the canary pattern in Application deployment and testing strategies.

To solve those problems, Spinnaker has an automated canary analysis feature: it reads the metrics of both versions from your monitoring system and runs a statistical analysis to automate the comparison. This tutorial shows you how to do an automated canary analysis on an app deployed on GKE and monitored by Cloud Monitoring.

Spinnaker is an advanced app deployment and management platform for organizations with complex deployment scenarios, often with a dedicated release engineering function. You can run this tutorial without prior Spinnaker experience. However, implementing automated canary analysis in production is generally done by teams that already have Spinnaker experience, a strong monitoring system, and that know how to determine if a release is safe.

About this tutorial

The app in this tutorial is a simple "Hello World" whose error rate is configured with an environment variable. A pre-built Docker image for this app is provided. As illustrated in the following image, the app exposes metrics in the Prometheus format, an open source monitoring system popular in the Kubernetes community, and compatible with Cloud Monitoring.

Architecture of app


  • Install Spinnaker for Google Cloud.
  • Deploy an app to GKE without a canary deployment.
  • Configure and run a canary deployment of the app.
  • Configure the automated canary analysis.
  • Test the automated canary analysis.


Before you begin

  1. Select or create a Google Cloud project.

    Go to Manage Resources

  2. Enable billing for your project.

    Enable billing

When you finish this tutorial, you can avoid continued billing by deleting the resources you created. See Cleaning up for more detail.

Deploy Spinnaker for Google Cloud using Cloud Shell

In this section, you configure the infrastructure required to complete the tutorial. Run all the terminal commands in this tutorial from Cloud Shell.

Spinnaker for Google Cloud gives you a way to set up and manage Spinnaker in a production-ready configuration, optimized for Google Cloud. Spinnaker for Google Cloud sets up many resources (GKE, Memorystore, Cloud Storage buckets and service accounts) required to run Spinnaker in Google Cloud, integrates Spinnaker with related services such as Cloud Build, and provides a Cloud Shell-based management environment for your Spinnaker installations, with helpers and common tools such as spin and hal.

  1. In Cloud Shell, open Spinnaker for Google Cloud. This clones the Spinnaker for Google Cloud repository into your Cloud Shell environment and launches the detailed installation instructions.

    Go to Cloud Shell

  2. Install Spinnaker for Google Cloud:

    PROJECT_ID=${DEVSHELL_PROJECT_ID} ~/cloudshell_open/spinnaker-for-gcp/scripts/install/
  3. Install the Monitoring-Prometheus integration plugin:

    export KUBE_NAMESPACE=prometheus
    export DATA_DIR=/prometheus/
    export DATA_VOLUME=prometheus-storage-volume
    export SIDECAR_IMAGE_TAG=0.7.0
    export GCP_REGION=us-east1-c
    export KUBE_CLUSTER=spinnaker-1
    export PROMETHEUS_VER_TAG=latest
    kubectl create namespace ${KUBE_NAMESPACE}
    kubectl apply -f
    kubectl apply -f
    curl -sS | \
      envsubst | \
      kubectl apply -f -
  4. Restart Cloud Shell to load new environment settings.

    Cloud Shell restart menu option.

  5. Connect to Spinnaker:

  6. In Cloud Shell, select the Web Preview icon and select Preview on port 8080.

    Cloud Shell Restart Option in Menu

Deploying an app with Spinnaker

In this section, you configure Spinnaker to deploy an app in the GKE cluster.

Create a Spinnaker app

Before you deploy, you create the Spinnaker app.

  1. In Spinnaker, select Actions, and then select Create Application.

    Create application drop-down menu

  2. In the New Application dialog, enter the following values:

    • Name: sampleapp
    • Owner Email: []

  3. Select Create.

You are now in the sampleapp of Spinnaker. It isn't configured yet, so most of the tabs are empty.

Create and run a deployment pipeline

In this section, you first deploy the app with a simple Spinnaker pipeline that takes a successRate parameter to create a GKE Deployment with four Pods. Those Pods throw errors randomly at a rate corresponding to the successRate parameter. In this tutorial, they throw 500 errors at a rate of 100 - successRate.

  1. In Cloud Shell, create the pipeline with the provided JSON file. The following command posts the JSON definition of the pipeline directly to the Spinnaker API.

    cd ~
    sed "s/my-kubernetes-account/spinnaker-install-account/g" simple-deploy.json > updated-simple-deploy.json
    spin pipeline save --file updated-simple-deploy.json
  2. In the Pipelines section of Spinnaker, a pipeline called Simple deploy appears. If you don't see it, reload the page. Select Start Manual Execution.

    Start manual execution of simple deploy pipeline

  3. In the Confirm Execution window, select a Success Rate of 70, and then select Run. After a few seconds, the pipeline successfully deploys the configuration of the app and four Pods.

  4. In Cloud Shell, create a Pod that makes requests to your new app until the end of the tutorial.

    kubectl -n default run injector --generator=run-pod/v1 --image=alpine:3.10 -- \
        /bin/sh -c "apk add --no-cache curl; \
        while true; do curl -sS --max-time 3 \
        http://sampleapp:8080/; done"

Check the logs of the injector

  1. To see the behavior of the app, check the logs of the injector:

    kubectl -n default logs -f \
        $(kubectl -n default get pods -l run=injector \
  2. A high number of Internal Server Error messages appear in the logs. To stop following the logs of the injector, press Ctrl+C .

Check the health of your app

Now that your app is deployed and serves traffic, see if it's behaving correctly. Of course, in this tutorial, you already know that it isn't because you deployed the app with only a 70% success rate.

The app exposes a /metrics endpoint with metrics in the Prometheus format that are ingested by Monitoring. In this section, you visualize those metrics in Monitoring.

  1. In the Google Cloud console, go to Monitoring.

    Go to Monitoring

  2. If Metrics Explorer is shown in the navigation pane, select Metrics Explorer. Otherwise, select Resources and then select Metrics Explorer.

  3. Ensure Configuration is the selected tab.

  4. Select the box labeled Metric, and enter

  5. To refine the graph, in the Group By field, enter code.

    In the following graph, the rates of HTTP requests answered by the app are grouped by HTTP status code:

    Graph of HTTP requests answered by the app.

    If you don't have any data in Monitoring, or if you can't find the metric, wait a few minutes for the data to be ingested by Monitoring before reloading Metrics Explorer.

    As you can see in the graph, the app currently has an unacceptable error rate—around 30%, as expected. The rest of the tutorial guides you through the setup of a canary deployment pipeline and an automatic analysis to prevent future deployments of an app with such a high error rate.

Creating a canary deployment

In this section, you create a canary deployment pipeline, without automated analysis, to test the new version of the app before deploying it fully to production. For simplicity, the pipeline you create in this section relies on Kubernetes load-balancing to send traffic to the canary version. As a consequence, you are not able to chose which portion of the traffic is routed to the canary. To implement advanced traffic routing policies, you can use Istio.

In the following image, different stages of this pipeline are outlined:

Illustration of the stages of a canary deployment pipeline.

  • Step 0: Like in the Simple Deploy pipeline, the pipeline takes a Success Rate parameter as input. This new pipeline uses this parameter to simulate different success rates. This is the Configuration of the pipeline.

  • Step 1: The Find Baseline Version stage retrieves the current version of the app running in production from the latest execution of the Simple Deploy pipeline. In this tutorial, it retrieves the success rate of the currently deployed app.

    In parallel with the Find Baseline Version stage, the Deploy Canary Config stage deploys the new success rate configuration for the canary version of the app.

  • Step 2: The Deploy Canary and Deploy Baseline stages deploy the two versions for comparison, the new canary version and a baseline version. The canary version uses the configuration created in Deploy Canary Config, whereas the baseline version uses the configuration used by the production version.

  • Step 3: The Manual Judgment stage stops the pipeline until you continue. During this stage, you can check if the canary version behaves correctly.

  • Step 4: After you continue past the Manual Judgment stage, both the Delete Canary and Delete Baseline stages clean up the infrastructure.

    In parallel with the cleanup, the Deploy to Production stage is launched and triggers the Simple Deploy pipeline with the same Success Rate parameter that you gave initially. The same version of the app that you tested in a canary is deployed in production.

    The Deploy to Production stage is triggered only if you chose to Continue during the Manual Judgment stage.

  • Step 5: Finally, the Successful Deployment stage validates that the whole pipeline is successful. It checks that you gave the go-ahead in the Manual Judgment stage and executes only if the Deploy to Production, Delete Canary, and Delete Baseline stages ran successfully.

Now, you can create and run the Canary Deploy pipeline.

  1. To create the Canary Deploy pipeline, run the following command to fetch the ID of the Simple deploy pipeline and inject it into the Canary Deploy pipeline:

    cd ~
    export PIPELINE_ID=$(spin pipeline get -a sampleapp -n 'Simple deploy' | jq -r '.id')
    jq '(.stages[] | select(.refId == "9") | .pipeline) |= env.PIPELINE_ID | (.stages[] | select(.refId == "8") | .pipeline) |= env.PIPELINE_ID' canary-deploy.json | \
        sed "s/my-kubernetes-account/spinnaker-install-account/g" > updated-canary-deploy.json
        spin pipeline save --file updated-canary-deploy.json
  2. If you don't see the Canary Deploy pipeline in Spinnaker, reload the sampleapp page, and select Pipelines.

  3. To launch the Canary Deploy pipeline:

    1. Select Start Manual Execution.
    2. Select a Success Rate of 80.
    3. Select Run.
  4. When the pipeline reaches the Manual Judgment stage, don't select Continue yet because you need to compare the canary version with the baseline version.

    Manual judgment stage of the canary pipeline.

  5. In Cloud Shell, run the kubectl -n default get pods command to see the new Pods labeled canary and baseline:

    NAME                                READY STATUS  RESTARTS  AGE
    injector-66bd655ffd-9ntwx           1/1   Running 0         30m
    sampleapp-5cdf8f55dd-995rz          1/1   Running 0         28m
    sampleapp-5cdf8f55dd-dqq8n          1/1   Running 0         28m
    sampleapp-5cdf8f55dd-ntq57          1/1   Running 0         28m
    sampleapp-5cdf8f55dd-rlpzp          1/1   Running 0         28m
    sampleapp-baseline-567b8d6849-gsgqr 1/1   Running 0          4m
    sampleapp-canary-54b9759dd6-gmjhc   1/1   Running 0          4m
  6. In the Google Cloud console, go to Monitoring.

    Go to Monitoring

  7. If Metrics Explorer is shown in the navigation pane, select Metrics Explorer. Otherwise, select Resources and then select Metrics Explorer.

  8. Ensure Configuration is the selected tab.

  9. To display the error rate for both the baseline and the canary, specifying the following parameters:

    1. Metric:
    2. In the Group By field, enter code.

    If Monitoring is missing some data, wait a few minutes for it to appear.

  10. Compare the canary version (purple in the following graph) with the baseline version (blue in the following graph). Colors might differ in your graph. In this tutorial, the canary version has a lower error rate than the baseline version. Therefore, it is safe to fully deploy the canary version to production. If the canary version didn't have a lower error rate, you might want to stop the deployment at this stage and make some corrections to your app.

    Graph that compares the canary error rate with the baseline version.

  11. In Spinnaker, in the Manual Judgment dialog, select Continue.

  12. When the deployment is finished, go back to Monitoring.

    Go to Monitoring

  13. If Metrics Explorer is shown in the navigation pane, select Metrics Explorer. Otherwise, select Resources and then select Metrics Explorer.

  14. Ensure Configuration is the selected tab.

  15. Select the box labeled Metric, and then select from the menu or enter the name for the resource and metric. Use the following information to complete the fields for this text box:

    1. For the Metric, select or enter
    2. In the Group By field, enter code.

    In the following graph, the rate of HTTP requests answered by the app is split by HTTP status code:

    Graph that compares  the rate of HTTP requests.

    This graph shows the rate of HTTP codes, 200 and 500, for all Pods: production, baseline and canary. Because the canary version had a lower error rate, you deployed it in production. After a short period of time during the deployment, where the total number of requests is slightly lower, you can see that the overall error rate is lowered: the canary version has correctly been deployed in production.

Automating canary analysis

A canary deployment is useful, but the way it's currently configured, it's a manual process. You have to manually check that the canary behaves as you want before doing a full deployment, and the difference between canary and baseline isn't always clear.

Automating the canary analysis is a good idea: you don't have to do it yourself, and an automated statistical analysis is better suited than humans to detect problems in a set of metrics. In this section, the Manual Judgment stage is replaced by an automated canary analysis.

Enable canary support

First, in Spinnaker you configure the automated canary analysis feature, called Kayenta. To configure Kayenta, use Halyard, the same tool used to configure and deploy Spinnaker.

  1. Configure Kayenta to use Monitoring as a backend:

    hal config canary google enable
    hal config canary google account add kayenta-tutorial --project $DEVSHELL_PROJECT_ID
    hal config canary google edit --stackdriver-enabled=true
  2. Apply the new configuration:


    The deployment takes a few minutes to complete.

Configure the automatic canary analysis feature

Now that Kayenta is enabled, configure it for sampleapp.

  1. In Spinnaker, select Config.

  2. In the Features section, select Canary, and then select Save Changes.

    Screenshot of features for the pipeline

Create a canary configuration

In Spinnaker, an automated canary analysis runs a statistical test on different metrics and outputs a score. This score can range from 0 to 100 and represents the number of metrics that pass or fail the comparison between the baseline and the canary. You can influence the score, by placing metrics in different groups, with different weights for each group. Depending on the score of the analysis, you might want to go ahead with the deployment or not. If you use a single metric—like in this tutorial—the score can only be 0 (fail) or 100 (pass).

An app can have several canary configurations that can be shared across several apps. A canary configuration has two main elements:

  • A set of metrics to analyze (possibly in different groups).
  • Marginal and pass thresholds for the score.

In a deployment pipeline, a canary configuration is used during the Canary Analysis stage. This stage can include several canary runs. If the score of any run is below the marginal threshold, the stage is stopped and the other runs are not executed. The last run's score needs to be above the pass threshold for the whole stage to be considered successful.

To create a canary configuration, follow these steps:

  1. Now that canary is enabled, the Pipelines section is replaced with Delivery (if you don't see the Delivery section, reload Spinnaker). In the Delivery section, go to Canary Configs.
  2. Select Add Configuration.
  3. For Configuration Name, enter kayenta-test.
  4. In the Metrics section, select Add Metric.
  5. In the Add Metric dialog, enter the following values, and then select OK:

    • Name: error_rate
    • Fail on: increase
    • Resource Type: k8s_container
    • Metric type:
    • Aligner: ALIGN_RATE
    • Filter Template: Choose Create new.

      • For the Name of the new Filter Template, enter: http_code
      • For the Template of the new Filter Template, enter: metric.labels.http_code = "500" AND resource.label.pod_name = starts_with("${scope}")
      • Select Save.
  6. In the Scoring section set Group 1 to 100.

  7. Select Save Changes.

Add a canary analysis stage to the pipeline

Now that you have a canary configuration, modify your existing deployment pipeline to replace the Manual Judgment stage with a Canary Analysis stage that uses this configuration.

  1. Go to Delivery > Pipelines, and for the Canary Deploy pipeline, select Configure.

    Screenshot of configure button for the canary deploy.

  2. Select Add Stage.

  3. For Type, select Canary Analysis.

  4. In the Depends On section, modify your new stage to depend on the following selections:

    • Deploy Canary
    • Deploy Baseline
  5. Fill in the Canary Analysis Configuration section with the following values:

    Parameter name Value Definition
    Analysis Type Real Time (Manual) The automatic mode, where canary and baseline are created for you, is not yet available for Kubernetes.
    Config Name kayenta-test The name of the canary configuration you created earlier.
    Lifetime 0 hours 5 minutes How long the canary analysis should last.
    Delay 0 The time you give to the app to warm up before doing the analysis.
    Interval 5 The time window Kayenta uses to run a single statistical analysis.
    Baseline sampleapp-baseline The GKE Deployment Kayenta uses as baseline.
    Baseline Location default The GKE namespace in which the baseline lives.
    Canary sampleapp-canary The GKE Deployment Kayenta uses as canary.
    Canary Location default The GKE namespace in which the canary lives.
    Marginal 75 The threshold score for a marignal canary pass.
    Pass 95 The threshold score for an overall canary pass.
  6. In the Execution Options section, select Ignore the failure. You ignore the failure so you can destroy the baseline and the canary even if the canary analysis failed. Later in the tutorial, you modify the stages to take a potential canary failure into account.

  7. In the pipeline's schema, select Deploy to Production.

    Screenshot of Deploy to Production button for the pipeline

  8. Change the Depends On section, to the following parameters:

    1. Add Canary Analysis.
    2. Remove Manual Judgment.
  9. To ensure that you deploy to production only if the canary analysis succeeds, change the Conditional on Expression parameter.

    ${ #stage('Canary Analysis')['status'].toString() == 'SUCCEEDED'}
  10. In the pipeline's schema, select Delete Canary, and change the Depends On section to the following parameters:

    1. Add Canary Analysis.
    2. Remove Manual Judgment.
  11. In the pipeline's schema, select Delete Baseline, and change the Depends On section.

    1. Add Canary Analysis.
    2. Remove Manual Judgment.
  12. To ensure that the whole pipeline fails if the canary analysis fails, in the pipeline's schema, select Successful deployment, and then for the existing precondition select the Edit icon.

    Edit the existing precondition of the successful deployment

    1. Change the Expression to the following:

      ${ #stage('Canary Analysis')['status'].toString() == 'SUCCEEDED'}
    2. Select Update.

  13. Finish replacing the Manual Judgment stage with the newly created Canary Analysis stage.

    1. In the pipeline's schema, select Manual Judgment.
    2. Select Remove stage.
  14. Select Save Changes.

    Your pipeline now looks like the following image:

    Visualization of the canary analysis pipeline.

Test your new pipeline

Now that the automated canary analysis is configured, test the pipeline to ensure it behaves as expected.

  1. Go to Delivery > Pipelines, and for the Canary Deploy pipeline, or Automated Canary Deploy if you used the CLI, select Start Manual Execution.

  2. Select a Success Rate of 60 and then select Run.

  3. To check the current progress of the canary analysis, select Canary Analysis, and then select Task Status. After a few minutes, the Canary Analysis stage fails, because the current success rate in production is 80. When the Canary Analysis stage fails, go to the report for this canary analysis.

    1. Select Canary Analysis.
    2. Select Canary Summary.
    3. Select the Report icon.

      On the report page, the error rate is higher for the canary version than it is for the baseline version.

      Report icon for the canary analysis summary.

  4. Repeat the steps in this section, but select a Success Rate of 90 for a successful canary analysis.

Clean up

To avoid incurring charges to your Google Cloud account for the resources used in this tutorial, either delete the project that contains the resources, or keep the project and delete the individual resources.

  1. In the Google Cloud console, go to the Manage resources page.

    Go to Manage resources

  2. In the project list, select the project that you want to delete, and then click Delete.
  3. In the dialog, type the project ID, and then click Shut down to delete the project.

Delete the resources

If you want to keep the Google Cloud project you used in this tutorial, delete the individual resources:

  1. Delete the GKE cluster.

    gcloud container clusters delete spinnaker-1
  2. When prompted for confirmation, type Y.

What's next