Automating Canary Analysis on Google Kubernetes Engine with Spinnaker

This tutorial guides you through configuring the automated canary analysis feature of Spinnaker on Google Kubernetes Engine (GKE).


Spinnaker is an open source, continuous delivery system led by Netflix and Google to manage the deployment of apps on different computing platforms, including App Engine, GKE, Compute Engine, AWS, and Azure. Using Spinnaker, you can implement advanced deployment methods, including canary deployments.

In a canary deployment, you expose a new version of your app to a small portion of your production traffic and analyze its behavior before going ahead with the full deployment. This lets you mitigate risks before deploying a new version to all of your users. To use canary deployments, you must accurately compare the behavior of the old and new versions of your app. The differences can be subtle and might take some time to appear. You might also have a lot of different metrics to examine.

To solve those problems, Spinnaker has an automated canary analysis feature: it reads the metrics of both versions from your monitoring system and runs a statistical analysis to automate the comparison. This tutorial shows you how to do an automated canary analysis on an app deployed on GKE and monitored by Stackdriver.

Spinnaker is an advanced app deployment and management platform for organizations with complex deployment scenarios, often with a dedicated release engineering function. You can run this tutorial without prior Spinnaker experience. However, implementing automated canary analysis in production is generally done by teams that already have Spinnaker experience, a strong monitoring system, and that know how to determine if a release is safe.

About this tutorial

The app in this tutorial is a simple "Hello World" whose error rate is configured with an environment variable. A pre-built Docker image for this app is provided. As illustrated in the following image, the app exposes metrics in the Prometheus format, an open source monitoring system popular in the Kubernetes community, and compatible with Stackdriver.

Architecture of app


  • Install Spinnaker for GCP.
  • Deploy an app to GKE without a canary deployment.
  • Configure and run a canary deployment of the app.
  • Configure the automated canary analysis.
  • Test the automated canary analysis.


Before you begin

  1. Select or create a GCP project.


  2. Enable billing for your project.


  3. Create a Stackdriver account.


When you finish this tutorial, you can avoid continued billing by deleting the resources you created. See Cleaning up for more detail.

Deploy Spinnaker for GCP using Cloud Shell

In this section, you configure the infrastructure required to complete the tutorial. Run all the terminal commands in this tutorial from Cloud Shell.

Spinnaker for GCP gives you a way to set up and manage Spinnaker in a production-ready configuration, optimized for GCP. Spinnaker for GCP sets up many resources (GKE, Cloud Memorystore, Cloud Storage buckets and service accounts) required to run Spinnaker in GCP, integrates Spinnaker with related services such as Cloud Build, and provides a Cloud Shell-based management environment for your Spinnaker installations, with helpers and common tools such as spin and hal.

  1. Open Spinnaker for GCP in Cloud Shell. This clones the Spinnaker for GCP repository into your Cloud Shell environment and launches the detailed installation instructions.


  2. Install Spinnaker for GCP.

    PROJECT_ID=${DEVSHELL_PROJECT_ID} ~/spinnaker-for-gcp/scripts/install/
  1. Install the Stackdriver-Prometheus integration plugin.

    kubectl apply --as=admin --as-group=system:masters -f \
    curl -sS "" | \
        sed "s/_stackdriver_project_id:.*/_stackdriver_project_id: $DEVSHELL_PROJECT_ID/" | \
        sed "s/_kubernetes_cluster_name:.*/_kubernetes_cluster_name: spinnaker-1/" | \
        sed "s/_kubernetes_location:.*/_kubernetes_location: us-west1-b/" | \
        kubectl apply -f -
  2. Restart Cloud Shell to load new environment settings.

    Cloud Shell restart menu option.

  3. Connect to Spinnaker.

  4. In Cloud Shell, click the Web Preview icon and select Preview on port 8080.

    Cloud Shell Restart Option in Menu

Deploying an app with Spinnaker

In this section, you configure Spinnaker to deploy an app in the GKE cluster.

Create a Spinnaker app

Before you deploy, you create the Spinnaker app.

  1. In Spinnaker, click Actions > Create Application.

    Create application drop-down menu

  2. In the New Application dialog, enter the following values:

    • Name: sampleapp
    • Owner Email: []

  3. Click Create.

You are now in the sampleapp of Spinnaker. It isn't configured yet, so most of the tabs are empty.

Create and run a deployment pipeline

In this section, you first deploy the app with a simple Spinnaker pipeline that takes a successRate parameter to create a GKE Deployment with four pods. Those pods throw errors randomly at a rate corresponding to the successRate parameter. In this tutorial, they throw 500 errors at a rate of 100 - successRate.

  1. In Cloud Shell, create the pipeline with the provided JSON file. The following command posts the JSON definition of the pipeline directly to the Spinnaker API.

    cd ~
    sed "s/my-kubernetes-account/spinnaker-install-account/g" simple-deploy.json > updated-simple-deploy.json
    spin pipeline save --file updated-simple-deploy.json
  2. In the Pipelines section of Spinnaker, a pipeline called Simple deploy appears. If you don't see it, reload the page. Click Start Manual Execution.

    Start manual execution of simple deploy pipeline

  3. In the Confirm Execution window, select a Success Rate of 70, and then click Run. After a few seconds, the pipeline successfully deploys the configuration of the app and four pods.

  4. In Cloud Shell, create a pod that makes requests to your new app until the end of the tutorial.

    kubectl -n default run injector --image=alpine:3.10 -- \
        /bin/sh -c "apk add --no-cache curl; \
        while true; do curl -sS --max-time 3 \
        http://sampleapp:8080/; done"

Check the logs of the injector

  1. To see the behavior of the app, check the logs of the injector.

    kubectl -n default logs -f \
        $(kubectl -n default get pods -l run=injector \
  2. A high number of Internal Server Error messages appear in the logs. To stop following the logs of the injector, press Ctrl+C .

Check the health of your app

Now that your app is deployed and serves traffic, see if it's behaving correctly. Of course, in this tutorial, you already know that it isn't because you deployed the app with only a 70% success rate.

The app exposes a /metrics endpoint with metrics in the Prometheus format that are ingested by Stackdriver. In this section, you visualize those metrics in Stackdriver.

  1. In Stackdriver, go to Metrics Explorer.

    Go to Metrics Explorer

  2. In the Find resource type and metric field, enter the following:
  3. To refine the graph, in the Group By field, enter http_code. In the following graph, the rates of HTTP requests answered by the app are grouped by HTTP status code:

    Graph of HTTP requests answered by the app

As you can see in the graph, the app currently has an unacceptable error rate—around 30%, as expected. The rest of the tutorial guides you through the setup of a canary deployment pipeline and an automatic analysis to prevent future deployments of an app with such a high error rate.

Creating a canary deployment

In this section, you create a canary deployment pipeline, without automated analysis, to test the new version of the app before deploying it fully to production. In the following image, different stages of this pipeline are outlined:

Illustration of the stages of a canary deployment pipeline

  • Step 0: Like in the Simple Deploy pipeline, the pipeline takes a Success Rate parameter as input. This new pipeline uses this parameter to simulate different success rates. This is the Configuration of the pipeline.
  • Step 1: The Find Baseline Version stage retrieves the current version of the app running in production from the latest execution of the Simple Deploy pipeline. In this tutorial, it retrieves the success rate of the currently deployed app.
  • In parallel with the Find Baseline Version stage, the Deploy Canary Config stage deploys the new success rate configuration for the canary version of the app.
  • Step 2: The Deploy Canary and Deploy Baseline stages deploy the two versions for comparison, the new canary version and a baseline version. The canary version uses the configuration created in Deploy Canary Config whereas the baseline version uses the configuration used by the production version.

  • Step 3: The Manual Judgment stage stops the pipeline until you continue. During this stage, you can check if the canary version behaves correctly.

  • Step 4: Once you continue past the Manual Judgment stage, both the Delete Canary and Delete Baseline stages clean up the infrastructure.

  • In parallel with the cleanup, the Deploy to Production stage is launched and triggers the Simple Deploy pipeline with the same Success Rate parameter that you gave initially. The same version of the app you tested in a canary is deployed in production.

  • The Deploy to Production stage is triggered only if you chose to continue during the Manual Judgment stage.

  • Step 5: Finally, the Successful Deployment stage validates that the whole pipeline is successful. It checks that you gave the go-ahead in the Manual Judgment stage and only executes if the Deploy to Production, Delete Canary, and Delete Baseline stages executed successfully.

Now, you can create and run the Canary Deploy pipeline.

  1. To create the Canary Deploy pipeline, run the following command to fetch the ID of the Simple deploy pipeline and inject it into the Canary Deploy pipeline:

    cd ~
    export PIPELINE_ID=$(spin pipeline get -a sampleapp -n 'Simple deploy' | jq -r '.id')
    jq '(.stages[] | select(.refId == "9") | .pipeline) |= env.PIPELINE_ID | (.stages[] | select(.refId == "8") | .pipeline) |= env.PIPELINE_ID' canary-deploy.json | \
        sed "s/my-kubernetes-account/spinnaker-install-account/g" > updated-canary-deploy.json
        spin pipeline save --file updated-canary-deploy.json
  2. If you don't see the Canary Deploy pipeline in Spinnaker, reload the sampleapp page, and click Pipelines.

  3. To launch the Canary Deploy pipeline:

    1. Click Start Manual Execution.
    2. Select a Success Rate of 80.
    3. Click Run.
  4. When the pipeline reaches the Manual Judgment stage, don't click Continue yet because you need to compare the canary version with the baseline version.

    Manual judgement stage of the canary pipeline

  5. In Cloud Shell, run the kubectl -n default get pods command to see the new pods labeled canary and baseline:

    NAME                                READY STATUS  RESTARTS  AGE
    injector-66bd655ffd-9ntwx           1/1   Running 0         30m
    sampleapp-5cdf8f55dd-995rz          1/1   Running 0         28m
    sampleapp-5cdf8f55dd-dqq8n          1/1   Running 0         28m
    sampleapp-5cdf8f55dd-ntq57          1/1   Running 0         28m
    sampleapp-5cdf8f55dd-rlpzp          1/1   Running 0         28m
    sampleapp-baseline-567b8d6849-gsgqr 1/1   Running 0          4m
    sampleapp-canary-54b9759dd6-gmjhc   1/1   Running 0          4m
  6. In Stackdriver, go to Metrics Explorer.

    Go to Metrics Explorer

  7. If there are any metrics already configured in Metrics Explorer, remove all existing configuration from the form.

  8. Display the error rate for both the baseline and the canary, specifying the following parameters:

    1. Metric:
    2. Filters:

      • http_code equals 500
      • version different (!=) from prod
  9. Compare the canary version (purple in the following graph) with the baseline version (blue in the following graph). Colors might differ in your graph. In this tutorial, the canary version has a lower error rate than the baseline version. Therefore, it is safe to fully deploy the canary version to production. If the canary version didn't have a lower error rate, you might want to stop the deployment at this stage and make some corrections to your app.

    Graph that compares the canary error rate with the baseline version

  10. In Spinnaker, in the Manual Judgement dialog, click Continue.

    Manual judgement stage of the canary pipeline

  11. When the deployment is finished, in Stackdriver, go back to Metrics Explorer.

    Go to Metrics Explorer

  12. If there are any metrics already configured in Metrics Explorer, remove all existing configuration from the form.

  13. In the Find resource type and metric field, enter the following:
  14. In the Group By field, enter http_code. In the following graph, the rate of HTTP requests answered by the app is split by HTTP status code:

    Graph that compares the rate of HTTP requests

    This graph shows the rate of HTTP codes, 200 and 500, for all pods: production, baseline and canary. Because the canary version had a lower error rate, you deployed it in production. After a short period of time during the deployment, where the total number of requests is slightly lower, you can see that the overall error rate is lowered: the canary version has correctly been deployed in production.

Automating canary analysis

A canary deployment is useful, but in its current implementation, it's a manual process. You have to manually check that the canary behaves as you want before doing a full deployment, and the difference between canary and baseline isn't always clear.

Automating the canary analysis is a good idea: you don't have to do it yourself, and an automated statistical analysis is better suited than humans to detect problems in a set of metrics. In this section, the Manual Judgement stage is replaced by an automated canary analysis.

Enable canary support

First, in Spinnaker you configure the automated canary analysis feature, called Kayenta. To configure Kayenta, use Halyard, the same tool used to configure and deploy Spinnaker.

  1. Configure Kayenta to use Stackdriver as backend.

    hal config canary google enable
    hal config canary google account add kayenta-tutorial --project $DEVSHELL_PROJECT_ID
    hal config canary google edit --stackdriver-enabled=true
  2. Apply the new configuration.


Configure the automatic canary analysis feature

Now that Kayenta is enabled, configure it for sampleapp.

  1. In Spinnaker, click Config.

  2. In the Features section, select Canary, and then click Save Changes.

    Screenshot of features for the pipeline

Create a canary configuration

In Spinnaker, an automated canary analysis runs a statistical test on different metrics and outputs a score. This score can range from 0 to 100 and represents the number of metrics that pass or fail the comparison between the baseline and the canary. You can influence the score, by placing metrics in different groups, with different weights for each group. Depending on the score of the analysis, you might want to go ahead with the deployment or not. If you use a single metric—like in this tutorial—the score can only be 0 (fail) or 100 (pass).

An app can have several canary configurations that can be shared across several apps. A canary configuration has two main elements:

  • A set of metrics to analyze (possibly in different groups).
  • Marginal and pass thresholds for the score.

In a deployment pipeline, a canary configuration is used during the Canary Analysis stage. This stage can include several canary runs. If the score of any run is below the marginal threshold, the stage is stopped and the other runs are not executed. The last run's score needs to be above the pass threshold for the whole stage to be considered successful.

To create a canary configuration, follow these steps:

  1. Now that canary is enabled, the Pipelines section is replaced with Delivery (if you don't see the Delivery section, reload Spinnaker). In the Delivery section, go to Canary Configs.

  2. Click Add Configuration.

  3. For Configuration Name, enter kayenta-test.

  4. In the Metrics section, click Add Metric.

  5. In the Add Metric dialog, enter the following values, and then click OK:

    • Name: error_rate
    • Fail on: increase
    • Resource Type: k8s_container
    • Metric type:
    • Aligner: ALIGN_RATE
    • Filter Template: Choose Create new...
      • For the Name of the new Filter Template, enter: http_code
      • For the Template of the new Filter Template, enter: metric.labels.http_code = "500" AND resource.label.pod_name = starts_with("${scope}")
      • Click Save
  6. In the Scoring section set Group 1 to 100.

  7. Click Save Changes.

Add a canary analysis stage to the pipeline

Now that you have a canary configuration, modify your existing deployment pipeline to replace the Manual Judgment stage with a Canary Analysis stage that uses this configuration.

  1. Go to Delivery > Pipelines, and for the Canary Deploy pipeline, click Configure.

    Screenshot of configure button for the canary deploy

  2. Click Add Stage.

  3. For Type, select Canary Analysis.

  4. In the Depends On section, modify your new stage to depend on the following selections:

    • Deploy Canary
    • Deploy Baseline
  5. Fill in the Canary Analysis Configuration section with the following values:

    Parameter name Value Definition
    Analysis Type Real Time (Manual) The automatic mode, where canary and baseline are created for you, is not yet available for Kubernetes.
    Config Name kayenta-test The name of the canary configuration you created earlier.
    Lifetime 0 hours 5 minutes How long the canary analysis should last.
    Delay 0 The time we give to the app to warm up before doing the analysis.
    Interval 5 The time window Kayenta should use to run a single statistical analysis.
    Baseline sampleapp-baseline The GKE Deployment Kayenta should use as baseline.
    Baseline Location default The GKE namespace in which the baseline lives.
    Canary sampleapp-canary The GKE Deployment Kayenta should use as canary.
    Canary Location default The GKE namespace in which the canary lives.
    Marginal 75 The threshold score for a marignal canary pass.
    Pass 95 The threshold score for an overall canary pass.
  6. In the Execution Options section, select Ignore the failure. You ignore the failure so you can destroy the baseline and the canary even if the canary analysis failed. Later in the tutorial, you modify the stages to take a potential canary failure into account.

  7. In the pipeline's schema, click Deploy to Production.

    Screenshot of Deploy to Production button for the pipeline

  8. Change the Depends On section, to the following parameters:

    1. Add Canary Analysis.
    2. Remove Manual Judgment.
  9. To ensure that you deploy to production only if the canary analysis succeeds, change the Conditional on Expression parameter.

    ${ #stage('Canary Analysis')['status'].toString() == 'SUCCEEDED'}
  10. In the pipeline's schema, click Delete Canary, and change the Depends On section to the following parameters:

    1. Add Canary Analysis.
    2. Remove Manual Judgment.
  11. In the pipeline's schema, click Delete Baseline, and change the Depends On section.

    1. Add Canary Analysis.
    2. Remove Manual Judgment.
  12. To ensure that the whole pipeline fails if the canary analysis fails, in the pipeline's schema, click Successful deployment, and then for the existing precondition click the Edit icon.

    Edit the existing precondition of the successful deployment

    1. Change the Expression to the following:

      ${ #stage('Canary Analysis')['status'].toString() == 'SUCCEEDED'}
    2. Click Update.

  13. Finish replacing the Manual Judgement stage with the newly created Canary Analysis stage.

    1. In the pipeline's schema, click Manual Judgment.
    2. Click Remove stage.
  14. Click Save Changes. Your pipeline now looks like the following image: Visualization of the canary analysis pipeline

Test your new pipeline

Now that the automated canary analysis is configured, test the pipeline to ensure it behaves as expected.

  1. Go to Delivery > Pipelines, and for the Canary Deploy pipeline, or Automated Canary Deploy if you used the CLI, click Start Manual Execution.

  2. Select a Success Rate of 60 and then click Run.

  3. To check the current progress of the canary analysis, click Canary Analysis, and then click Task Status. After a few minutes, the Canary Analysis stage fails, because the current success rate in production is 80. When the Canary Analysis stage fails, go to the report for this canary analysis.

    1. Click Canary Analysis.
    2. Click Canary Summary.
    3. Click the Report icon. On the report page, the error rate is higher for the canary version than it is for the baseline version.

      Report icon for the canary analysis summary

  4. Repeat the steps in this section, but select a Success Rate of 90 for a successful canary analysis.

Cleaning up

To avoid incurring charges to your Google Cloud Platform account for the resources used in this tutorial:

  1. In the GCP Console, go to the Manage resources page.

    Go to the Manage resources page

  2. In the project list, select the project you want to delete and click Delete .
  3. In the dialog, type the project ID, and then click Shut down to delete the project.

Delete the resources

If you want to keep the GCP project you used in this tutorial, delete the individual resources:

  1. Delete the GKE cluster.

    gcloud container clusters delete spinnaker-1
  2. When prompted for confirmation, type Y.

What's next

Hai trovato utile questa pagina? Facci sapere cosa ne pensi:

Invia feedback per...