Using Envoy Proxy to load-balance gRPC services on GKE

This tutorial demonstrates how to expose multiple gRPC services deployed on Google Kubernetes Engine (GKE) on a single external IP address by using Network Load Balancing and Envoy Proxy. The tutorial highlights some of the advanced features that Envoy provides for gRPC.


gRPC is an open source, language-independent RPC framework based on HTTP/2 that uses protocol buffers for efficient on-the-wire representation and fast serialization. Inspired by Stubby, the internal Google RPC framework, gRPC enables low-latency communication between microservices and between mobile clients and APIs.

gRPC runs over HTTP/2 and offers several advantages over HTTP/1.1, such as efficient binary encoding, multiplexing of requests and responses over a single connection, and automatic flow control. gRPC also offers several options for load balancing. This tutorial focuses on situations where clients are untrusted, such as mobile clients and clients running outside the trust boundary of the service provider. Of the load-balancing options that gRPC provides, you use proxy-based load balancing in this tutorial.

In the tutorial, you deploy a Kubernetes Service of TYPE=LoadBalancer, which is exposed as transport layer (layer 4) Network Load Balancing on Google Cloud. This service provides a single public IP address and passes TCP connections directly to the configured backends. In the tutorial, the backend is a Kubernetes Deployment of Envoy instances.

Envoy is an open source application layer (layer 7) proxy that offers many advanced features. In this tutorial, you use it to terminate SSL/TLS connections and route gRPC traffic to the appropriate Kubernetes Service. Compared to other application layer solutions such as Kubernetes Ingress, using Envoy directly provides multiple customization options, like the following:

  • Service discovery
  • Load-balancing algorithms
  • Transforming requests and responses—for instance, to JSON or gRPC-Web
  • Authenticating requests by validating JWT tokens
  • gRPC health checks

By combining Network Load Balancing with Envoy, you can set up an endpoint (external IP address) that forwards traffic to a set of Envoy instances running in a GKE cluster. These instances then use application layer information to proxy requests to different gRPC services running in the cluster. The Envoy instances use cluster DNS to identify and load-balance incoming gRPC requests to the healthy and running pods for each service. This means traffic is load-balanced to the pods per RPC request rather than per TCP connection from the client.


This tutorial uses the following billable components of Google Cloud:

To generate a cost estimate based on your projected usage, use the pricing calculator. New Google Cloud users might be eligible for a free trial.

When you finish this tutorial, you can avoid continued billing by deleting the resources you created. For more information, see Cleaning up.

Before you begin

  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  2. In the Google Cloud Console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  3. Make sure that billing is enabled for your Cloud project. Learn how to confirm that billing is enabled for your project.

  4. Enable the Cloud Build, Container Registry, and Container Analysis APIs.

    Enable the APIs


In this tutorial, you deploy two gRPC services, echo-grpc and reverse-grpc, in a Google Kubernetes Engine (GKE) cluster and expose them to the internet on a public IP address. The following diagram shows the architecture for exposing these two services through a single endpoint:

architecture for exposing `echo-grpc` and `reverse-grpc` through a single endpoint

Network Load Balancing accepts incoming requests from the internet (for example, from mobile clients or service consumers outside your company). Network Load Balancing performs the following tasks:

  • Load-balances incoming connections to the worker nodes in the pool. Traffic is forwarded to the envoy Kubernetes Service, which is exposed on all worker nodes in the cluster. The Kubernetes network proxy forwards these connections to pods that are running Envoy.
  • Performs HTTP health checks against the worker nodes in the cluster.

Envoy performs the following tasks:

  • Terminates SSL/TLS connections.
  • Discovers pods running the gRPC services by querying the internal cluster DNS service.
  • Routes and load-balances traffic to the gRPC service pods.
  • Performs health checks of the gRPC services according to the gRPC Health Checking Protocol.
  • Exposes an endpoint for health checking by using Network Load Balancing.

The gRPC services (echo-grpc and reverse-grpc) are exposed as Kubernetes headless Services. This means that no clusterIP address is assigned, and the Kubernetes network proxy doesn't load-balance traffic to the pods. Instead, a DNS A record that contains the pod IP addresses is created in the cluster DNS service. Envoy discovers the pod IP addresses from this DNS entry and load-balances across them according to the policy configured in Envoy.

The following diagram shows the Kubernetes objects involved in this tutorial:

Kubernetes objects used in this tutorial, including services, YAML files, DNS A records, secrets, pods, and proxy entry

Initializing the environment

In this section, you set environment variables that are used later in the tutorial.

  1. Open Cloud Shell:

    GO TO Cloud Shell

    You use Cloud Shell to run all the commands in this tutorial.

  2. In Cloud Shell, display the current project ID:

    gcloud config list --format 'value(core.project)'
  3. If the command does not return the ID of the project you selected, configure Cloud Shell to use your project, replacing project-id with the name of your project:

    gcloud config set project project-id
  4. Define environment variables for the region and zone you want to use for this tutorial:


    This tutorial uses the us-central1 region and the us-central1-b zone. However, you can change the region and zone to suit your needs.

Create the GKE cluster

  1. Create a GKE cluster for running your gRPC services:

    gcloud container clusters create grpc-cluster --zone $ZONE
  2. Verify that the kubectl context has been set up by listing the worked nodes in your cluster:

    kubectl get nodes -o name

    The output looks similar to this:


Deploy the gRPC services

To route traffic to multiple gRPC services behind one load balancer, you deploy two simple gRPC services: echo-grpc and reverse-grpc. Both services expose a unary method that takes a string in the content request field. echo-grpc responds with the content unaltered, while reverse-grpc responds with the content string reversed.

  1. Clone the repository containing the gRPC services and switch to the working directory:

    git clone
    cd grpc-gke-nlb-tutorial
  2. Using Cloud Build, create the container images for the Echo and Reverse gRPC services and store them in Container Registry:

    gcloud builds submit -t$GOOGLE_CLOUD_PROJECT/echo-grpc echo-grpc
    gcloud builds submit -t$GOOGLE_CLOUD_PROJECT/reverse-grpc reverse-grpc
  3. Verify that the images exist in Container Registry:

    gcloud container images list --repository$GOOGLE_CLOUD_PROJECT

    The output looks similar to this:

  4. Create Kubernetes Deployments for echo-grpc and reverse-grpc:

        k8s/echo-deployment.yaml | kubectl apply -f -
        k8s/reverse-deployment.yaml | kubectl apply -f -
  5. Check that two pods are available for each deployment:

    kubectl get deployments

    The output looks similar to the following. The values for DESIRED, CURRENT, UP-TO-DATE, and AVAILABLE should be 2 for both deployments.

    echo-grpc      2         2         2            2           1m
    reverse-grpc   2         2         2            2           1m
  6. Create Kubernetes headless Services for echo-grpc and reverse-grpc. These commands create DNS A records in the cluster's DNS service but don't allocate virtual IP addresses.

    kubectl apply -f k8s/echo-service.yaml
    kubectl apply -f k8s/reverse-service.yaml
  7. Check that both echo-grpc and reverse-grpc exist as Kubernetes Services:

    kubectl get services

    The output looks similar to the following. Both echo-grpc and reverse-grpc should have TYPE=ClusterIP and CLUSTER-IP=None.

    NAME           TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)    AGE
    echo-grpc      ClusterIP   None         <none>        8081/TCP   35s
    kubernetes     ClusterIP     <none>        443/TCP    47m
    reverse-grpc   ClusterIP   None         <none>        8082/TCP   21s

Set up Network Load Balancing

  1. Create a Kubernetes Service of type LoadBalancer in your cluster:

    kubectl apply -f k8s/envoy-service.yaml

    This command provisions the resources required for Network Load Balancing and assigns an ephemeral public IP address. It can take a few minutes to assign the public IP address.

  2. Run the following command and wait until the value for EXTERNAL-IP for the envoy service changes from <pending> to a public IP address:

    kubectl get services envoy --watch
  3. Press Control+C to stop waiting.

Create a self-signed SSL/TLS certificate

Envoy uses a certificate and key when it's terminating SSL/TLS connections. You start by creating a self-signed SSL/TLS certificate.

  1. Create an environment variable to store the public IP address of the envoy service that you created in the previous section:

    EXTERNAL_IP=$(kubectl get service envoy -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
  2. Create a self-signed SSL/TLS certificate and key:

    openssl req -x509 -nodes -newkey rsa:2048 -days 365 \
        -keyout privkey.pem -out cert.pem -subj "/CN=$EXTERNAL_IP"
  3. Create a Kubernetes TLS Secret called envoy-certs that contains the self-signed SSL/TLS certificate and key:

    kubectl create secret tls envoy-certs \
        --key privkey.pem --cert cert.pem \
        --dry-run -o yaml | kubectl apply -f -

Deploy Envoy

  1. Create a Kubernetes ConfigMap to store the Envoy configuration file (envoy.yaml):

    kubectl apply -f k8s/envoy-configmap.yaml
  2. Create a Kubernetes Deployment for Envoy:

    kubectl apply -f k8s/envoy-deployment.yaml
  3. Verify that two envoy pods are running:

    kubectl get deployment envoy

    The output looks similar to this.

    envoy     2         2         2            2           1m

You are now ready to test the gRPC services.

Test the gRPC services

To test the services, you use the grpcurl command-line tool.

  1. In Cloud Shell, install grpcurl:

    go get
    go install
  2. Send a request to the Echo gRPC service:

    grpcurl -d '{"content": "echo"}' -proto echo-grpc/api/echo.proto \
        -insecure -v $EXTERNAL_IP:443 api.Echo/Echo

    The output looks similar to this:

    Resolved method descriptor:
    rpc Echo ( .api.EchoRequest ) returns ( .api.EchoResponse );
    Request metadata to send:
    Response headers received:
    content-type: application/grpc
    date: Wed, 27 Feb 2019 04:40:19 GMT
    hostname: echo-grpc-5c4f59c578-wcsvr
    server: envoy
    x-envoy-upstream-service-time: 0
    Response contents:
      "content": "echo"
    Response trailers received:
    Sent 1 request and received 1 response

    The hostname response header shows the name of the echo-grpc pod that handled the request. If you repeat the command a few times, you should see two different values for the hostname response header, corresponding to the names of the echo-grpc pods.

  3. Verify the same behavior with the Reverse gRPC service:

    grpcurl -d '{"content": "reverse"}' -proto reverse-grpc/api/reverse.proto \
        -insecure -v $EXTERNAL_IP:443 api.Reverse/Reverse

    The output looks similar to this:

    Resolved method descriptor:
    rpc Reverse ( .api.ReverseRequest ) returns ( .api.ReverseResponse );
    Request metadata to send:
    Response headers received:
    content-type: application/grpc
    date: Wed, 27 Feb 2019 04:45:56 GMT
    hostname: reverse-grpc-74cdc4849f-tvsfb
        server: envoy
    x-envoy-upstream-service-time: 2
    Response contents:
      "content": "esrever"
    Response trailers received:
    Sent 1 request and received 1 response


If you run into problems with this tutorial, we recommend that you review these documents:

You can also explore the Envoy administration interface to diagnose problems with the Envoy configuration.

  1. To open the administration interface, set up port forwarding from Cloud Shell to the admin port of one of the Envoy pods:

    kubectl port-forward \
        $(kubectl get pods -o name | grep envoy | head -n1) 8080:8090
  2. Wait until you see this output in the console:

    Forwarding from -> 8090
  3. Click the Web preview button in Cloud Shell and select Preview on port 8080. This opens a new browser window showing the administration interface.

    Envoy admin interface with preview selected

  4. When you are done, switch back to Cloud Shell and press Control+C to end port forwarding.

Alternative ways to route gRPC traffic

You can modify this solution in a number of ways to suit your environment.

Alternative application layer load balancers

Some of the application layer functionality that Envoy provides can also be provided by other load-balancing solutions:

  • You can configure HTTP(S) Load Balancing using a Kubernetes Ingress object and use this instead of Network Load Balancing and Envoy. Using HTTP(S) Load Balancing provides several benefits compared to Network Load Balancing, such as managed SSL/TLS certificates and integration with other Google Cloud products such as Cloud CDN and IAP.

    We recommend that you use HTTP(S) Load Balancing when you don't need support for any of the following:

    • gRPC health checks
    • Fine-grained control over the load-balancing algorithm
    • Exposing more than 50 services

    To learn more about how to deploy HTTP(S) Load Balancing with a sample gRPC service, see the Google Kubernetes Engine documentation on Ingress and the GKE gRPC Ingress LoadBalancing tutorial on GitHub.

  • If you use Istio, you can use its features to route and load-balance gRPC traffic. Istio's Ingress Gateway is deployed as Network Load Balancing with an Envoy backend, similar to the architecture in this tutorial. The main difference is that the Envoy Proxy is configured through Istio's traffic routing objects. To make the example services in this tutorial routable in the Istio service mesh, you must remove the line clusterIP: None from the Kubernetes Service manifests (echo-service.yaml and reverse-service.yaml). This means using the service discovery and load balancing functionality of Istio instead of the similar functionality in Envoy. If you already use Istio, we recommend using the Ingress Gateway to route to your gRPC services.

  • You can use NGINX in place of Envoy, either as a Deployment or using the NGINX Ingress Controller for Kubernetes. Envoy is used in this tutorial because it provides more advanced gRPC functionality, such as support for the gRPC health checking protocol.

  • You can use Ambassador and Contour, which provide Kubernetes Ingress Controllers and are based on Envoy.

  • You can use Voyager, which is a Kubernetes Ingress Controller based on HAProxy.

Internal VPC network connectivity

If you want to expose the services outside your GKE cluster but only inside your VPC network, you can use Internal TCP/UDP Load Balancing in place of Network Load Balancing. To do so, add the annotation "Internal" to the envoy-service.yaml manifest.

Envoy deployment versus DaemonSet

In this tutorial, Envoy is configured as a Kubernetes Deployment. This configuration means that the replica setting in the deployment manifest determines the number of Envoy pods. If the load balancer forwards incoming requests to a worker node that isn't running an Envoy pod, the Kubernetes network proxy forwards the request to a worker node that's running an Envoy pod.

DaemonSet is an alternative to deployment for Envoy. With a DaemonSet, an Envoy pod runs on every worker node in the GKE cluster. This alternative means higher resource usage in large clusters (more Envoy pods), but it also means that incoming requests always reach a worker node that's running an Envoy pod. The result is less network traffic in your cluster and lower average latency, because requests are not forwarded between worker nodes to reach an Envoy pod.

Cleaning up

After you've finished the current tutorial, you can clean up the resources that you created on Google Cloud so they won't take up quota and you won't be billed for them in the future. The following sections describe how to delete or turn off these resources.

Delete the project

  1. In the Cloud Console, go to the Manage resources page.

    Go to Manage resources

  2. In the project list, select the project that you want to delete, and then click Delete.
  3. In the dialog, type the project ID, and then click Shut down to delete the project.

Delete the resources

If you want to keep the Google Cloud project you used in this tutorial, delete the individual resources:

  1. In Cloud Shell, delete the local Git repository clone:

    cd ; rm -rf ~/grpc-gke-nlb-tutorial
  2. Delete the images in Container Registry:

    gcloud container images list-tags$GOOGLE_CLOUD_PROJECT/echo-grpc \
        --format 'value(digest)' | xargs -I {} gcloud container images delete \
        --force-delete-tags --quiet$GOOGLE_CLOUD_PROJECT/echo-grpc@sha256:{}
    gcloud container images list-tags$GOOGLE_CLOUD_PROJECT/reverse-grpc \
        --format 'value(digest)' | xargs -I {} gcloud container images delete \
        --force-delete-tags --quiet$GOOGLE_CLOUD_PROJECT/reverse-grpc@sha256:{}
  3. Delete the Google Kubernetes Engine cluster:

    gcloud container clusters delete grpc-cluster --zone $ZONE --quiet  --async

What's next