Using Envoy Proxy to load-balance gRPC services on GKE

Stay organized with collections Save and categorize content based on your preferences.

This tutorial demonstrates how to expose multiple gRPC services deployed on Google Kubernetes Engine (GKE) on a single external IP address by using Network Load Balancing and Envoy Proxy. The tutorial highlights some of the advanced features that Envoy provides for gRPC.


gRPC is an open source, language-independent RPC framework based on HTTP/2 that uses protocol buffers for efficient on-the-wire representation and fast serialization. Inspired by Stubby, the internal Google RPC framework, gRPC enables low-latency communication between microservices and between mobile clients and APIs.

gRPC runs over HTTP/2 and offers several advantages over HTTP/1.1, such as efficient binary encoding, multiplexing of requests and responses over a single connection, and automatic flow control. gRPC also offers several options for load balancing. This tutorial focuses on situations where clients are untrusted, such as mobile clients and clients running outside the trust boundary of the service provider. Of the load-balancing options that gRPC provides, you use proxy-based load balancing in this tutorial.

In the tutorial, you deploy a Kubernetes Service of TYPE=LoadBalancer, which is exposed as transport layer (layer 4) Network Load Balancing on Google Cloud. This service provides a single public IP address and passes TCP connections directly to the configured backends. In the tutorial, the backend is a Kubernetes Deployment of Envoy instances.

Envoy is an open source application layer (layer 7) proxy that offers many advanced features. In this tutorial, you use it to terminate TLS connections and route gRPC traffic to the appropriate Kubernetes Service. Compared to other application layer solutions such as Kubernetes Ingress, using Envoy directly provides multiple customization options, like the following:

  • Service discovery
  • Load-balancing algorithms
  • Transforming requests and responses—for instance, to JSON or gRPC-Web
  • Authenticating requests by validating JWT tokens
  • gRPC health checks

By combining Network Load Balancing with Envoy, you can set up an endpoint (external IP address) that forwards traffic to a set of Envoy instances running in a GKE cluster. These instances then use application layer information to proxy requests to different gRPC services running in the cluster. The Envoy instances use cluster DNS to identify and load-balance incoming gRPC requests to the healthy and running pods for each service. This means traffic is load-balanced to the pods per RPC request rather than per TCP connection from the client.


In this tutorial, you deploy two gRPC services, echo-grpc and reverse-grpc, in a Google Kubernetes Engine (GKE) cluster and expose them to the internet on a public IP address. The following diagram shows the architecture for exposing these two services through a single endpoint:

architecture for exposing `echo-grpc` and `reverse-grpc` through a single endpoint

Network Load Balancing accepts incoming requests from the internet (for example, from mobile clients or service consumers outside your company). Network Load Balancing performs the following tasks:

  • Load-balances incoming connections to the nodes in the pool. Traffic is forwarded to the envoy Kubernetes Service, which is exposed on all nodes in the cluster. The Kubernetes network proxy forwards these connections to pods that are running Envoy.
  • Performs HTTP health checks against the nodes in the cluster.

Envoy performs the following tasks:

  • Terminates TLS connections.
  • Discovers pods running the gRPC services by querying the internal cluster DNS service.
  • Routes and load-balances traffic to the gRPC service pods.
  • Performs health checks of the gRPC services according to the gRPC Health Checking Protocol.
  • Exposes an endpoint for health checking by using Network Load Balancing.

The gRPC services (echo-grpc and reverse-grpc) are exposed as Kubernetes headless Services. This means that no clusterIP address is assigned, and the Kubernetes network proxy doesn't load-balance traffic to the pods. Instead, a DNS A record that contains the pod IP addresses is created in the cluster DNS service. Envoy discovers the pod IP addresses from this DNS entry and load-balances across them according to the policy configured in Envoy.

The following diagram shows the Kubernetes objects involved in this tutorial:

Kubernetes objects used in this tutorial, including services, YAML files, DNS A records, secrets, pods, and proxy entry


This tutorial uses the following billable components of Google Cloud:

To generate a cost estimate based on your projected usage, use the pricing calculator. New Google Cloud users might be eligible for a free trial.

When you finish this tutorial, you can avoid continued billing by deleting the resources you created. For more information, see Clean up.

Before you begin

  1. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  2. Make sure that billing is enabled for your Cloud project. Learn how to check if billing is enabled on a project.

  3. In the Google Cloud console, activate Cloud Shell.

    Activate Cloud Shell

Preparing the environment

  1. In Cloud Shell, set the Cloud project that you want to use for this tutorial:

    gcloud config set project PROJECT_ID

    Replace PROJECT_ID with your Cloud project ID. When you run this command, Cloud Shell creates an exported environment variable called GOOGLE_CLOUD_PROJECT that contains your project ID.

  2. Enable the Google Kubernetes Engine API API:

    gcloud services enable

Creating the GKE cluster

  1. Create a GKE cluster for running your gRPC services:

    gcloud container clusters create grpc-cluster \
        --enable-ip-alias \
        --release-channel regular \
        --scopes cloud-platform \
        --workload-pool $ \
        --zone us-central1-f

    This tutorial uses the us-central1-f zone. You can use a different zone or region.

  2. Verify that the kubectl context has been set up by listing the nodes in your cluster:

    kubectl get nodes --output=name

    The output looks similar to this:


Deploying the gRPC services

To route traffic to multiple gRPC services behind one load balancer, you deploy two sample gRPC services: echo-grpc and reverse-grpc. Both services expose a unary method that takes a string in the content request field. echo-grpc responds with the content unaltered, while reverse-grpc responds with the content string reversed.

  1. Clone the repository containing the gRPC services and switch to the repository directory:

    git clone ~/grpc-gke-nlb-tutorial
    cd ~/grpc-gke-nlb-tutorial
  2. Create a self-signed TLS certificate and private key:

    openssl req -x509 -newkey rsa:4096 -nodes -sha256 -days 365 \
        -keyout privkey.pem -out cert.pem -extensions san \
        -config \
        <(echo "[req]";
          echo distinguished_name=req;
          echo "[san]";
         ) \
        -subj '/'
  3. Create a Kubernetes Secret called envoy-certs that contains the self-signed TLS certificate and private key:

    kubectl create secret tls envoy-certs --key=privkey.pem --cert=cert.pem \
        --dry-run=client --output=yaml | kubectl apply --filename -

    Envoy uses this TLS certificate and private key when it terminates TLS connections.

  4. Build the container images for the sample apps echo-grpc and reverse-grpc, publish the images to Container Registry, and deploy Envoy and both of the sample apps to the GKE, using Skaffold:

    skaffold run$GOOGLE_CLOUD_PROJECT

    Skaffold is an open source from Google that automates the workflow for building, pushing and deploying applications as containers.

  5. Verify that two pods are ready for each deployment:

    kubectl get deployments

    The output looks similar to the following. The values for READY should be 2/2 for all deployments.

    echo-grpc      2/2     2            2           1m
    envoy          2/2     2            2           1m
    reverse-grpc   2/2     2            2           1m
  6. Verify that echo-grpc, envoy, and reverse-grpc exist as Kubernetes Services:

    kubectl get services

    The output looks similar to the following. Both echo-grpc and reverse-grpc should have TYPE=ClusterIP and CLUSTER-IP=None.

    NAME           TYPE           CLUSTER-IP    EXTERNAL-IP      PORT(S)         AGE
    echo-grpc      ClusterIP      None          <none>           8081/TCP        2m
    envoy          LoadBalancer      443:31516/TCP   2m
    reverse-grpc   ClusterIP      None          <none>           8082/TCP        2m

Test the gRPC services

To test the services, you use the grpcurl command-line tool.

  1. In Cloud Shell, install grpcurl:

    go install
  2. Get the external IP address of the envoy Kubernetes Service and store it in an environment variable:

    EXTERNAL_IP=$(kubectl get service envoy \
  3. Send a request to the echo-grpc sample app:

    grpcurl -d '{"content": "echo"}' -proto echo-grpc/api/echo.proto \
        -authority -cacert cert.pem -v \
        $EXTERNAL_IP:443 api.Echo/Echo

    The output looks similar to this:

    Resolved method descriptor:
    rpc Echo ( .api.EchoRequest ) returns ( .api.EchoResponse );
    Request metadata to send:
    Response headers received:
    content-type: application/grpc
    date: Wed, 02 Jun 2021 07:18:22 GMT
    hostname: echo-grpc-75947768c9-jkdcw
    server: envoy
    x-envoy-upstream-service-time: 3
    Response contents:
      "content": "echo"
    Response trailers received:
    Sent 1 request and received 1 response

    The hostname response header shows the name of the echo-grpc pod that handled the request. If you repeat the command a few times, you should see two different values for the hostname response header, corresponding to the names of the echo-grpc pods.

  4. Verify the same behavior with the Reverse gRPC service:

    grpcurl -d '{"content": "reverse"}' -proto reverse-grpc/api/reverse.proto \
        -authority -cacert cert.pem -v \
        $EXTERNAL_IP:443 api.Reverse/Reverse

    The output looks similar to this:

    Resolved method descriptor:
    rpc Reverse ( .api.ReverseRequest ) returns ( .api.ReverseResponse );
    Request metadata to send:
    Response headers received:
    content-type: application/grpc
    date: Wed, 02 Jun 2021 07:20:15 GMT
    hostname: reverse-grpc-5c9b974f54-wlfwt
    server: envoy
    x-envoy-upstream-service-time: 1
    Response contents:
      "content": "esrever"
    Response trailers received:
    Sent 1 request and received 1 response


If you run into problems with this tutorial, we recommend that you review these documents:

You can also explore the Envoy administration interface to diagnose problems with the Envoy configuration.

  1. To open the administration interface, set up port forwarding from Cloud Shell to the admin port of one of the Envoy pods:

    kubectl port-forward \
        $(kubectl get pods -o name | grep envoy | head -n1) 8080:8090
  2. Wait until you see this output in the console:

    Forwarding from -> 8090
  3. Click the Web preview button in Cloud Shell and select Preview on port 8080. This opens a new browser window showing the administration interface.

    Envoy admin interface with preview selected

  4. When you are done, switch back to Cloud Shell and press Control+C to end port forwarding.

Alternative ways to route gRPC traffic

You can modify this solution in a number of ways to suit your environment.

Alternative application layer load balancers

Some of the application layer functionality that Envoy provides can also be provided by other load-balancing solutions:

  • You can configure HTTP(S) Load Balancing using a Kubernetes Ingress object and use this instead of Network Load Balancing and Envoy. Using HTTP(S) Load Balancing provides several benefits compared to Network Load Balancing, such as managed TLS certificates and integration with other Google Cloud products such as Cloud CDN and IAP.

    We recommend that you use HTTP(S) Load Balancing when you don't need support for any of the following:

    • gRPC health checks
    • Fine-grained control over the load-balancing algorithm
    • Exposing more than 50 services

    To learn more about how to deploy HTTP(S) Load Balancing with a sample gRPC service, see the Google Kubernetes Engine documentation on Ingress and the GKE gRPC Ingress LoadBalancing tutorial on GitHub.

  • If you use Anthos Service Mesh or Istio, you can use their features to route and load-balance gRPC traffic. Both Anthos Service Mesh and Istio provide an Ingress Gateway that is deployed as Network Load Balancing with an Envoy backend, similar to the architecture in this tutorial. The main difference is that the Envoy Proxy is configured through Istio's traffic routing objects. To make the example services in this tutorial routable in the Anthos Service Mesh or Istio service mesh, you must remove the line clusterIP: None from the Kubernetes Service manifests (echo-service.yaml and reverse-service.yaml). This means using the service discovery and load balancing functionality of Anthos Service Mesh or Istio instead of the similar functionality in Envoy. If you already use Anthos Service Mesh or Istio, we recommend using the Ingress Gateway to route to your gRPC services.

  • You can use NGINX in place of Envoy, either as a Deployment or using the NGINX Ingress Controller for Kubernetes. Envoy is used in this tutorial because it provides more advanced gRPC functionality, such as support for the gRPC health checking protocol.

Internal VPC network connectivity

If you want to expose the services outside your GKE cluster but only inside your VPC network, you can use Internal TCP/UDP Load Balancing in place of Network Load Balancing. To do so, add the annotation "Internal" to the envoy-service.yaml manifest.

Envoy Deployment versus DaemonSet

In this tutorial, Envoy is configured as a Kubernetes Deployment. This configuration means that the replica setting in the deployment manifest determines the number of Envoy pods. If the load balancer forwards incoming requests to a node that isn't running an Envoy pod, the Kubernetes network proxy forwards the request to a node that's running an Envoy pod.

DaemonSet is an alternative to deployment for Envoy. With a DaemonSet, an Envoy pod runs on every node in the GKE cluster. This alternative means higher resource usage in large clusters (more Envoy pods), but it also means that incoming requests always reach a node that's running an Envoy pod. The result is less network traffic in your cluster and lower average latency, because requests are not forwarded between nodes to reach an Envoy pod.

Clean up

After you finish the tutorial, you can clean up the resources that you created so that they stop using quota and incurring charges. The following sections describe how to delete or turn off these resources.

Delete the project

  1. In the Google Cloud console, go to the Manage resources page.

    Go to Manage resources

  2. In the project list, select the project that you want to delete, and then click Delete.
  3. In the dialog, type the project ID, and then click Shut down to delete the project.

Delete the resources

If you want to keep the Google Cloud project you used in this tutorial, delete the individual resources:

  1. In Cloud Shell, delete the local Git repository clone:

    cd ; rm -rf ~/grpc-gke-nlb-tutorial
  2. Delete the images in Container Registry:

    gcloud container images list-tags$GOOGLE_CLOUD_PROJECT/echo-grpc \
        --format 'value(digest)' | xargs -I {} gcloud container images delete \
        --force-delete-tags --quiet$GOOGLE_CLOUD_PROJECT/echo-grpc@sha256:{}
    gcloud container images list-tags$GOOGLE_CLOUD_PROJECT/reverse-grpc \
        --format 'value(digest)' | xargs -I {} gcloud container images delete \
        --force-delete-tags --quiet$GOOGLE_CLOUD_PROJECT/reverse-grpc@sha256:{}
  3. Delete the GKE cluster:

    gcloud container clusters delete grpc-cluster --zone us-central1-f --async

What's next