Implement flat-mode network model with BGP support

This document describes how to implement a flat-mode network model with Border Gateway Protocol (BGP) support. When you implement a network model with BGP support, BGP dynamically ensures that pods in different Layer 2 domains can communicate with each other. Flat-mode networking with BGP is sometimes called dynamic flat IP.

For more information about flat-mode network models, see Flat vs island mode network models.

How to implement a flat-mode network that uses BGP

Flat-mode networking with BGP is enabled when you create a new cluster. You can't enable this feature for an existing cluster. Once this feature is enabled, you can make changes to some of the configuration settings.

To implement a cluster on a flat-mode network model with BGP support:

  1. Edit the cluster configuration file:

    • Set the spec.clusterNetwork.advancedNetworking field to true.
    • If you want to enable flat-mode networking for IPv4, set the spec.clusterNetwork.flatIPv4 field to true.

      For an alternative, see Dual-stack cluster (IPv4 Island, IPv6 Dynamic Flat IP), which configures your cluster with flat-mode networking for IPv6 only.

    apiVersion: baremetal.cluster.gke.io/v1
    kind: Cluster
    metadata:
      name: bm
      namespace: cluster-bm
    spec:
      type: user
      ...
      clusterNetwork:
        advancedNetworking: true
        flatIPv4: true
      ...
    

    When spec.clusterNetwork.flatIPv4is set to true, the field spec.clusterNetwork.pods.cidrBlocks is ignored and can be omitted. However, you need to add a ClusterCIDRConfigs manifest in the cluster configuration file (per-node, per-nodepool and/or per-cluster).

  2. Append a NetworkGatewayGroup manifest to the cluster configuration file:

    Specify the floating IPs to use for BGP peering. Ensure that the resource name is default and the namespace is the cluster namespace.

    ---
    apiVersion: networking.gke.io/v1
    kind: NetworkGatewayGroup
    metadata:
      name: default
      namespace: cluster-bm
    spec:
      floatingIPs:
      - 10.0.1.100
      - 10.0.2.100
    

    The NetworkGatewayGroup custom resource manages a list of one or more floating IP addresses. The BGP peering sessions are initiated from floating IP addresses that you specify in the NetworkGatewayGroup custom resource.

  3. Append a FlatIPMode manifest to the cluster configuration file:

    The name of the FlatIPMode resource must be default and the namespace is the cluster namespace. The peerSelector value flatip-peer: "true" matches the labels in BGPPeer objects bgppeer1 and bgppeer2 (defined in the following step), so both peers are used for flat-mode networking.

    The following FlatIPMode manifest is for IPv4 single-stack, flat-mode networking with BGP. For alternative configurations, see Configuration examples.

    ---
    apiVersion: baremetal.cluster.gke.io/v1alpha1
    kind: FlatIPMode
    metadata:
      name: default
      namespace: cluster-bm
    spec:
      enableBGPIPv4: true
      enableBGPIPv6: false
      peerSelector:
        flatip-peer: "true"
    
  4. Append one or more BGPPeer manifests to the cluster configuration file:

    You choose the names for the resources, but all BGPPeer resources must be in the cluster namespace.

    ---
    apiVersion: networking.gke.io/v1
    kind: BGPPeer
    metadata:
      name: bgppeer1
      namespace: cluster-bm
      labels:
        flatip-peer: "true"
    spec:
      localASN: 65001
      peerASN: 65000
      peerIP: 10.0.1.254
      sessions: 2
    ---
    apiVersion: networking.gke.io/v1
    kind: BGPPeer
    metadata:
      name: bgppeer2
      namespace: cluster-bm
      labels:
        flatip-peer: "true"
    spec:
      localASN: 65001
      peerASN: 65000
      peerIP: 10.0.2.254
      sessions: 2
    
  5. Append a ClusterCIDRConfig manifest to the cluster configuration file:

    The CusterCIDRConfig resource must also be in the cluster namespace.

    apiVersion: baremetal.cluster.gke.io/v1alpha1
    kind: ClusterCIDRConfig
    metadata:
      name: cluster-wide-1
      namespace: cluster-bm
    spec:
      ipv4:
        cidr: "192.168.0.0/16"
        perNodeMaskSize: 24
    

    ClusterCIDRConfig is a custom resource that specifies Pod CIDR ranges to be allocated to nodes dynamically. The CNI uses the Pod CIDR ranges allocated on a Node to allocate IP addresses to the individual Pods running on the Node. The ClusterCIDRConfig is also used for dual-stack networking. For more information about the ClusterCIDRConfig custom resource, including usage examples, see Understand the ClusterCIDRConfig custom resource.

  6. Create the cluster:

    bmctl create cluster
    

    For more information about creating clusters, see Cluster creation overview.

    If your environment supports multi-protocol BGP (MP-BGP), IPv4 and IPv6 routes can be advertised over these IPv4 sessions. For examples of different configurations, including examples that use MP-BGP, see Configuration examples.

Modify your BGP-based flat-mode networking configuration

After you've created your cluster configured to use a flat-mode network model with BGP, some configuration settings can be updated. Use the admin cluster kubeconfig file when you make subsequent updates to the BGP-related resources (NetworkGatewayGroup, FlatIPMode, and BGPPeer). The admin cluster then reconciles the changes to the user cluster. If you edit these resources on the user cluster directly, the admin cluster overwrites your changes in subsequent reconciliations.

Example configurations

The following sections include cluster configuration examples for different variations of the flat-mode network model with BGP. The sample configuration files aren't complete. Most cluster settings that aren't relevant to flat-mode networking with BGP have been omitted.

Single-stack IPv4 cluster

The following cluster configuration file sample shows the settings for configuring a single-stack IPv4 cluster with flat-mode networking with BGP:

apiVersion: baremetal.cluster.gke.io/v1
kind: Cluster
metadata:
  name: bm
  namespace: cluster-bm
spec:
  ...
  clusterNetwork:
    advancedNetworking: true
    flatIPv4: true
    services:
      cidrBlocks:
      - 10.96.0.0/12
  ...
---
apiVersion: baremetal.cluster.gke.io/v1alpha1
kind: ClusterCIDRConfig          
metadata:
  name: cluster-wide-1
  namespace: cluster-bm          # Must match the cluster namespace
spec:
  ipv4:
    cidr: "222.2.0.0/16"
    perNodeMaskSize: 24
---
apiVersion: networking.gke.io/v1
kind: NetworkGatewayGroup
metadata:
  name: default
  namespace: cluster-bm           # Must match the cluster namespace
spec:
  floatingIPs:
  - 10.0.1.100
  - 10.0.3.100
---
apiVersion: baremetal.cluster.gke.io/v1alpha1
kind: FlatIPMode
metadata:
  name: default
  namespace: cluster-bm            # Must match the cluster namespace
spec:
  enableBGPIPv4: true
  enableBGPIPv6: false
  peerSelector:
    flatipmode-peer: "true"
---
apiVersion: networking.gke.io/v1
kind: BGPPeer
metadata:
  name: bgppeer1
  namespace: cluster-bm            # Must match the cluster namespace
  labels:
    flatipmode-peer: "true"
spec:
  localASN: 65001
  peerASN: 65002
  peerIP: 10.0.1.254
  sessions: 2
---
apiVersion: networking.gke.io/v1
kind: BGPPeer
metadata:
  name: bgppeer2
  namespace: cluster-bm            # Must match the cluster namespace
  labels:
    flatipmode-peer: "true"
spec:
  localASN: 65001
  peerASN: 65002
  peerIP: 10.0.3.254
  sessions: 2

Dual-stack cluster (IPv4 Island, IPv6 Dynamic Flat IP)

The following cluster configuration file sample shows the settings for configuring a dual-stack (IPv4/IPv6) cluster with flat-mode networking with BGP for just IPv6:

apiVersion: baremetal.cluster.gke.io/v1
kind: Cluster
metadata:
  name: bm
  namespace: cluster-bm
spec:
  ...
  clusterNetwork:
    advancedNetworking: true
    flatIPv4: false
    pods:
      cidrBlocks:
      - 192.168.0.0/16
    services:
      cidrBlocks:
      - 10.96.0.0/12
      # Additional IPv6 CIDR block determines if the cluster is dual-stack
      - 2620:0:1000:2630:5:2::/112
  ... 
---
apiVersion: baremetal.cluster.gke.io/v1alpha1
kind: ClusterCIDRConfig          
metadata:
  name: cluster-wide-1
  namespace: cluster-bm          # Must match the cluster namespace
spec:
  ipv4:
    cidr: "192.168.0.0/16"
    perNodeMaskSize: 24
  ipv6:
    cidr: "2222:3::/112"
    perNodeMaskSize: 120
---
apiVersion: networking.gke.io/v1
kind: NetworkGatewayGroup
metadata:
  name: default
  namespace: cluster-bm           # Must match the cluster namespace
spec:
  floatingIPs:
  - 10.0.1.100
  - 10.0.3.100
---
apiVersion: baremetal.cluster.gke.io/v1alpha1
kind: FlatIPMode
metadata:
  name: default
  namespace: cluster-bm            # Must match the cluster namespace
spec:
  enableBGPIPv4: false
  enableBGPIPv6: true
  peerSelector:
    flatipmode-peer: "true"
---
apiVersion: networking.gke.io/v1
kind: BGPPeer
metadata:
  name: bgppeer1
  namespace: cluster-bm            # Must match the cluster namespace
  labels:
    flatipmode-peer: "true"
spec:
  localASN: 65001
  peerASN: 65002
  peerIP: 10.0.1.254
  sessions: 2
---
apiVersion: networking.gke.io/v1
kind: BGPPeer
metadata:
  name: bgppeer2
  namespace: cluster-bm            # Must match the cluster namespace
  labels:
    flatipmode-peer: "true"
spec:
  localASN: 65001
  peerASN: 65002
  peerIP: 10.0.3.254
  sessions: 2

Dual-stack cluster (IPv4 Dynamic Flat IP, IPv6 Dynamic Flat IP)

The following cluster configuration file sample shows the settings for configuring a dual-stack cluster with flat-mode networking with BGP:

apiVersion: baremetal.cluster.gke.io/v1
kind: Cluster
metadata:
  name: bm
  namespace: cluster-bm
spec:
  ...
  clusterNetwork:
    advancedNetworking: true
    flatIPv4: true
    pods:
      cidrBlocks:
      - 192.168.0.0/16
    services:
      cidrBlocks:
      - 10.96.0.0/12
      # Additional IPv6 CIDR block determines if the cluster is dual-stack
      - 2620:0:1000:2630:5:2::/112
  ... 
---
apiVersion: baremetal.cluster.gke.io/v1alpha1
kind: ClusterCIDRConfig          
metadata:
  name: cluster-wide-1
  namespace: cluster-bm          # Must match the cluster namespace
spec:
  ipv4:
    cidr: "222.2.0.0/16"
    perNodeMaskSize: 24
  ipv6:
    cidr: "2222:3::/112"
    perNodeMaskSize: 120
---
apiVersion: networking.gke.io/v1
kind: NetworkGatewayGroup
metadata:
  name: default
  namespace: cluster-bm           # Must match the cluster namespace
spec:
  floatingIPs:
  - 10.0.1.100
  - 10.0.3.100
---
apiVersion: baremetal.cluster.gke.io/v1alpha1
kind: FlatIPMode
metadata:
  name: default
  namespace: cluster-bm            # Must match the cluster namespace
spec:
  enableBGPIPv4: true
  enableBGPIPv6: true
  peerSelector:
    flatipmode-peer: "true"
---
apiVersion: networking.gke.io/v1
kind: BGPPeer
metadata:
  name: bgppeer1
  namespace: cluster-bm            # Must match the cluster namespace
  labels:
    flatipmode-peer: "true"
spec:
  localASN: 65001
  peerASN: 65002
  peerIP: 10.0.1.254
  sessions: 2
---
apiVersion: networking.gke.io/v1
kind: BGPPeer
metadata:
  name: bgppeer2
  namespace: cluster-bm            # Must match the cluster namespace
  labels:
    flatipmode-peer: "true"
spec:
  localASN: 65001
  peerASN: 65002
  peerIP: 10.0.3.254
  sessions: 2

Troubleshooting

To help you troubleshoot issues related to flat-mode networking with BGP, this section includes instructions for checking your configuration:

  1. Verify if a FlatIPModes object is created in the cluster namespace on the admin cluster:

    kubectl get flatipmodes -A --kubeconfig ADMIN_KUBECONFIG
    

    The response should look something like this:

    NAMESPACE                 NAME      AGE
    cluster-bm                default   2d17h
    
  2. Verify if a flatipmodes.networking.gke.io object is created on the user cluster:

    The flatipmodes.networking.gke.io object is cluster scoped.

    kubectl get flatipmodes.networking.gke.io --kubeconfig USER_KUBECONFIG
    

    The response should look something like this:

    NAME      AGE
    default   2d17h
    
  3. Get the BGPSessions resources to view the current sessions:

    kubectl get bgpsessions -A --kubeconfig USER_KUBECONFIG
    

    The response should look something like this:

    NAMESPACE     NAME                LOCAL ASN   PEER ASN   LOCAL IP       PEER IP        STATE            LAST REPORT
    kube-system   10.0.1.254-node-01  65500       65000      10.0.1.100     10.0.1.254     Established      2s
    kube-system   10.0.1.254-node-02  65500       65000      10.0.3.100     10.0.1.254     NotEstablished   2s
    kube-system   10.0.3.254-node-01  65500       65000      10.0.1.100     10.0.3.254     NotEstablished   2s
    kube-system   10.0.3.254-node-02  65500       65000      10.0.3.100     10.0.3.254     Established      2s
    
  4. Get the BGPAdvertisedRoute resources to see the routes currently being advertised:

    kubectl get bgpadvertisedroutes -A --kubeconfig USER_KUBECONFIG
    

    The response should something like this:

    NAMESPACE     NAME                     PREFIX         METRIC
    kube-system   route-via-222-22-208-240   222.2.0.0/24   
    kube-system   route-via-222-22-209-240   222.2.1.0/24   
    

    The route names indicate the next hop. For example, route-via-222-22-208-240 from the preceding example response indicates that the next hop for the advertised prefix 222.2.0.0/24 is 222.22.208.240.