Configure NAT for a GKE Pod CIDR block

Stay organized with collections Save and categorize content based on your preferences.
Last reviewed 2022-12-16 UTC

This tutorial shows you how to configure a solution in which you assign the Pod CIDR block an RFC 1918 address block that is already in use in an on-premises network. You then translate the CIDR block (by NAT) by using the ip-masq-agent feature. This approach effectively hides the Pod IPv4 addresses behind the node IP addresses. You then use Terraform to automate the infrastructure build and use the Google Cloud CLI to inspect the components shown in the following figure.

Translating a Pod CIDR block in a GKE cluster. (For a screen-readable PDF version, click the image.)
Figure 1. Translating a Pod CIDR block in a GKE cluster. (For a screen-readable PDF version, click the image.)

You set up the following components with Terraform:

  • A Cloud project with a VPC-native GKE cluster that hosts a Hello World app. This app is exposed through an internal load balancer.
  • A subnetwork for the GKE cluster.
  • A subnetwork simulating an on-premises CIDR block.

For a more in-depth discussion of the overall solution and individual components, see NAT for a GKE Pod CIDR block.

This tutorial assumes you are familiar with the following:

  • Linux sysadmin commands
  • GKE
  • Compute Engine
  • NAT
  • Terraform

Objectives

With Terraform, deploy the following:

  • A project with a VPC-native GKE cluster
  • A subnetwork for the GKE cluster
  • A subnetwork simulating an on-premises CIDR block
  • A Hello World app
  • An internal load balancer to expose the Hello World app

With the Google Cloud CLI, do the following:

  • Inspect each solution component.
  • Verify the Hello World app.
  • Verify the translation of the Pod CIDR block from the simulated on-premises machine.

Costs

This tutorial uses the following billable components of Google Cloud:

To generate a cost estimate based on your projected usage, use the pricing calculator. New Google Cloud users might be eligible for a free trial.

When you finish this tutorial, you can avoid continued billing by deleting the resources you created. For more information, see Clean up.

Before you begin

In this section, you prepare Cloud Shell, set up your environment variables, and deploy the supporting infrastructure.

Prepare Cloud Shell

  1. In the Google Cloud console, open Cloud Shell.

    Go to Cloud Shell

    You complete most of this tutorial from the Cloud Shell terminal using HashiCorp's Terraform and the Google Cloud CLI.

  2. In Cloud Shell, clone the GitHub repository and change to the local working directory:

    git clone https://github.com/GoogleCloudPlatform/terraform-gke-nat-connectivity.git kam
    cd kam/podnat
    

    The repository contains all the files that you need to complete this tutorial. For a complete description of each file, see the README.md file in the repository.

  3. Make all shell scripts executable:

    sudo chmod 755 *.sh
    
  4. Install Terraform.

  5. Initialize Terraform:

    terraform init
    

    The output is similar to the following:

    ...
    Initializing provider plugins...
    The following providers do not have any version constraints in configuration, so the latest version was installed.
    
    To prevent automatic upgrades to new major versions that may contain breaking changes, it is recommended to add version = "..." constraints to the corresponding provider blocks in configuration, with the constraint strings suggested below.
    
    ∗ provider.google: version = "~> 2.5"
    
    Terraform has been successfully initialized!
    
    You may now begin working with Terraform. Try running "terraform plan" to see any changes that are required for your infrastructure. All Terraform commands should now work.
    
    If you ever set or change modules or backend configuration for Terraform, rerun this command to reinitialize your working directory. If you forget, other commands will detect it and remind you to do so if necessary.
    ...
    

Set environment variables

  1. Set and verify the TF_VAR_org_id variable, replacing your-organization-name with the Google Cloud organization name you want to use in this tutorial:

    export TF_VAR_org_id=$(gcloud organizations list | \
        awk '/your-organization-name/ {print $2}')
    
  2. Verify that the environment variable is set correctly:

    echo $TF_VAR_org_id
    

    The command output lists your numeric organization ID and looks similar to the following:

    ...
    123123123123
    ...
    
  3. Set the remaining environment variables:

    source set_variables.sh
    

    Verify that the environment variables are set correctly:

    env | grep TF_
    

    The output is similar to the following:

    ...
    TF_VAR_zone=us-west1-b
    TF_VAR_cluster_password=ThanksForAllTheFish
    TF_VAR_node_cidr=10.32.1.0/24
    TF_VAR_region=us-west1
    TF_VAR_billing_account=QQQQQQ-XAAAAA-E87690
    TF_VAR_cluster_cidr=192.168.1.0/24
    TF_VAR_org_id=406999999999
    TF_VAR_ilb_ip=10.32.1.49
    TF_VAR_isolated_vpc_pid=ivpc-pid--999999999
    TF_VAR_gcp_user=user@example
    TF_VAR_on_prem_cidr=10.32.2.0/24
    TF_VAR_cluster_username=dolphins
    TF_VAR_pod_cidr=172.16.0.0/16
    ...
    
  4. Create an environment variable file:

    env | grep TF_ | sed 's/^/export /' > TF_ENV_VARS
    

    This command chain redirects the environment variables you created into a file called TF_ENV_VARS. Each variable is prepended with the export command. You can use this file to reset the environment variables in case your Cloud Shell session is terminated. These variables are used by the Terraform scripts, Cloud Shell scripts, and the gcloud command-line tools.

    If you need to reinitialize the variables later, you can run the following command from the directory where the file resides:

    source TF_ENV_VARS
    

Deploy supporting infrastructure

  • In Cloud Shell, deploy the Terraform supporting infrastructure:

    terraform apply
    

    Terraform prompts for confirmation before making any changes. Answer yes to apply either configuration.

    The terraform apply command instructs Terraform to deploy all the solution's components. To better understand how the infrastructure is declaratively defined, you can read through the Terraform manifests—that is, the files with the .tf extension.

Inspecting the supporting infrastructure

You now use the Google Cloud CLI to view and verify the infrastructure that Terraform created. Verification involves running a command to see if the resource responds and was created correctly.

Verify the projects

  1. In Cloud Shell, list the project:

    gcloud projects list | grep ivpc-pid
    

    The output is similar to the following:

    ...
    isolated-vpc-pid       isolated-vpc-pname        777999333962
    ...
    
  2. List the API status:

    gcloud services list --project=$TF_VAR_isolated_vpc_pid \
        | grep -E "compute|container"
    

    The output is similar to the following:

    ...
    compute.googleapis.com            Compute Engine API
    container.googleapis.com          Google Kubernetes Engine API
    ...
    

Verify the networks and subnetworks

  • In Cloud Shell, verify the networks and subnetworks:

    gcloud compute networks describe ivpc \
        --project=$TF_VAR_isolated_vpc_pid
    gcloud compute networks subnets describe node-cidr \
        --project=$TF_VAR_isolated_vpc_pid \
        --region=$TF_VAR_region
    gcloud compute networks subnets describe simulated-on-prem \
      --project=$TF_VAR_isolated_vpc_pid \
      --region=$TF_VAR_region
    

    The output is similar to the following:

    ...
    kind: compute#network
    name: ivpc
    routingConfig:
      routingMode: GLOBAL
    ...
    subnetworks:
    ‐ https://www.googleapis.com/compute/v1/projects/ivpc-pid--695116665/regions/us-west1/subnetworks/node-cidr
    x_gcloud_bgp_routing_mode: GLOBAL
    ...
    gatewayAddress: 10.32.1.1
    ...
    ipCidrRange: 10.32.1.0/24
    kind: compute#subnetwork
    name: node-cidr
    ...
    secondaryIpRanges:
    ‐ ipCidrRange: 172.16.0.0/16
      rangeName: pod-cidr
    ...
    subnetworks:
    ‐ https://www.googleapis.com/compute/v1/projects/ivpc-pid--695116665/regions/us-west1/subnetworks/simulated-on-prem
    x_gcloud_bgp_routing_mode: GLOBAL
    ...
    gatewayAddress: 10.32.2.1
    ...
    ipCidrRange: 10.32.2.0/24
    kind: compute#subnetwork
    name: simulated-on-prem
    ...
    

Verify the firewall rules

  • In Cloud Shell, verify the firewall rules in the isolated VPC:

    gcloud compute firewall-rules list --project=$TF_VAR_isolated_vpc_pid
    

    The output is similar to the following:

    ...
    NAME                  NETWORK           DIRECTION  PRIORITY  ALLOW  DENY  DISABLED
    allow-rfc1918-in-fwr  isolated-vpc-net  INGRESS    1000      all          False
    allow-ssh-in-fwr      isolated-vpc-net  INGRESS    1000      22           False
    ...
    

Verify the virtual machines

  • In Cloud Shell, verify the virtual machines:

    gcloud compute instances list --project=$TF_VAR_isolated_vpc_pid
    

    The output is similar to the following:

    ...
    NAME                                     ZONE        MACHINE_TYPE   PREEMPTIBLE  INTERNAL_IP  EXTERNAL_IP     STATUS
    gke-cluster1-default-pool-fc9ba891-4xhj  us-west1-b  n1-standard-1               10.32.1.4    34.83.33.188    RUNNING
    gke-cluster1-default-pool-fc9ba891-d0bd  us-west1-b  n1-standard-1               10.32.1.3    34.83.48.81     RUNNING
    gke-cluster1-default-pool-fc9ba891-xspg  us-west1-b  n1-standard-1               10.32.1.2    35.247.62.159   RUNNING
    simulated-on-prem-host                   us-west1-b  n1-standard-1               10.32.2.2    35.227.173.106  RUNNING
    ...
    

Verify the GKE cluster and its resources

  1. In Cloud Shell, get the cluster credentials:

    gcloud container clusters get-credentials cluster1 \
        --project=$TF_VAR_isolated_vpc_pid \
        --zone $TF_VAR_zone
    

    The output is similar to the following:

    ...
    Fetching cluster endpoint and auth data.
    kubeconfig entry generated for cluster1.
    ...
    
  2. Verify the cluster:

    gcloud container clusters list \
        --project=$TF_VAR_isolated_vpc_pid \
        --zone=$TF_VAR_zone
    

    The output is similar to the following:

    ...
    NAME     LOCATION    MASTER_VERSION MASTER_IP   MACHINE_TYPE   NODE_VERSION  NUM_NODES  STATUS
    cluster1 us-west1-b  1.11.8-gke.6   192.0.2.58  n1-standard-1  1.11.8-gke.6  3          RUNNING
    ...
    
  3. Verify the Hello World app:

    kubectl get deployment my-app
    

    The output is similar to the following:

    ...
    NAME     DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
    my-app   3         3         3            3           118m
    ...
    
  4. Verify the internal load balancer service:

    kubectl get service hello-server
    

    The output is similar to the following:

    ...
    NAME           TYPE           CLUSTER-IP    EXTERNAL-IP   PORT(S)          AGE
    hello-server   LoadBalancer   10.32.11.49   <pending>     8080:30635/TCP   3m
    ...
    

Verifying the solution

To verify the solution, you must verify the following:

  • The ip-masq-agent feature
  • That the Hello World app is externally accessible
  • The Pod CIDR NAT

Verify the ip-masq-agent feature

  1. In Cloud Shell, get a node name:

    export NODE_NAME=$(kubectl get nodes | awk '/gke/ {print $1}' | head -n 1)
    
  2. Use SSH to connect to a cluster node:

    gcloud compute ssh $NODE_NAME \
        --project=$TF_VAR_isolated_vpc_pid \
        --zone=$TF_VAR_zone
    
  3. Verify the ip-masq-agent configuration:

    sudo iptables -t nat -L
    

    The output is similar to the following:

    ...
    Chain IP-MASQ (2 references)
    target      prot opt source               destination
    RETURN      all  --  anywhere             169.254.0.0/16       /* ip-masq-agent: local traffic is not subject to MASQUERADE */
    RETURN      all  --  anywhere             10.32.1.0/24         /* ip-masq-agent: local traffic is not subject to MASQUERADE */
    RETURN      all  --  anywhere             172.16.0.0/16        /* ip-masq-agent: local traffic is not subject to MASQUERADE */
    RETURN      all  --  anywhere             192.168.1.0/24       /* ip-masq-agent: local traffic is not subject to MASQUERADE */
    MASQUERADE  all  --  anywhere             anywhere             /* ip-masq-agent: outbound traffic is subject to MASQUERADE (must be last in chain) */
    ...
    
  4. Exit the SSH session.

    exit
    

Verify that the Hello World app is externally accessible

  1. Use SSH to connect to the simulated on-premises VM:

    gcloud compute ssh simulated-on-prem-host \
        --project=$TF_VAR_isolated_vpc_pid \
        --zone=$TF_VAR_zone
    
  2. Verify the Hello World app:

    curl http://10.32.1.49:8080
    

    The output is similar to the following:

    ...
    Hello, world!
    Version: 1.0.0
    Hostname: my-app-77748bfbd8-nqwl2
    ...
    

Verify the Pod CIDR NAT

  1. On the simulated on-premises VM, run tcpdump:

    sudo tcpdump -n icmp
    

    The output is similar to the following:

    ...
    tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
    listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
    ...
    

    This command starts the tcpdump utility to capture the packets that traverse the isolated-VPC gateway. The utility only captures packets that contain the IP address of the internal load balancer that the GKE service created.

  2. In the Google Cloud console, open a new Cloud Shell terminal.

    Go to Cloud Shell

  3. From the new terminal, set environment variables by changing to the Git repository directory and entering the following command:

    source TF_ENV_VARS
    
  4. Get the cluster credentials:

    gcloud container clusters get-credentials cluster1 \
        --project=$TF_VAR_isolated_vpc_pid \
        --zone $TF_VAR_zone
    
  5. Get the Pod name:

    export POD_NAME=$(kubectl get pods | awk '/my-app/ {print $1}' | head -n 1)
    
  6. Connect to the Pod shell:

    kubectl exec -it $POD_NAME -- /bin/sh
    
  7. In the original Cloud Shell terminal, ping the simulated on-premises VM:

    ping 10.32.2.2
    

    In the original terminal, the output is similar to the following:

    ...
    
    05:43:40.669371 IP 10.32.1.3 > 10.32.2.2: ICMP echo request, id 3328, seq 0, length 64
    05:43:40.669460 IP 10.32.2.2 > 10.32.1.3: ICMP echo reply, id 3328, seq 0, length 64
    ...
    

    Notice the source IP address is from the node 10.32.1.0/24 CIDR block. The Pod 10.32.2.0/24 CIDR block has been translated behind the node address. Press Control+C to stop the ping and then exit all terminal sessions.

Clean up

Destroy the infrastructure

  1. From the first Cloud Shell terminal, exit from the SSH session to the isolated-VPC gateway by typing exit.
  2. Destroy all of the tutorial's components:

    terraform destroy
    

    Terraform prompts for confirmation before making the change. Answer yes to destroy the configuration.

    You might see the following Terraform error:

    ...
    ∗ google_compute_network.ivpc (destroy): 1 error(s) occurred:
    ∗ google_compute_network.ivpc: Error waiting for Deleting Network: The network resource 'projects/ivpc-pid--1058675427/global/networks/isolated-vpc-net' is already being used by 'projects/ivpc-pid--1058675427/global/firewalls/k8s-05693142c93de80e-node-hc'
    ...
    

    This error occurs when the command attempts to destroy the isolated-VPC network before destroying the GKE firewall rules. Run the following script to remove the non-default firewall rules from the isolated VPC:

    ./k8-fwr.sh
    

    The output shows you which firewall rules will be removed.

  3. Review the rules and, when prompted, type yes.

  4. From the first Cloud Shell terminal, reissue the following command:

    terraform destroy
    

    Terraform prompts for confirmation before making the change. Answer yes to destroy the configuration.

  5. From the original Cloud Shell terminal, issue the following command:

    cd ../..
    rm -rf kam
    

What's next