Troubleshoot Distributed Cloud connected

Google remotely monitors and maintains the Google Distributed Cloud connected hardware. For this purpose, Google engineers have Secure Shell (SSH) access to the Distributed Cloud connected hardware. If Google detects an issue, a Google engineer contacts you to troubleshoot and resolve it. If you have identified an issue yourself, contact Google Support immediately to diagnose and resolve it.

Corrupt BGP sessions in Cloud Router resources used by VPN connections

Distributed Cloud VPN connections rely on BGP sessions established and managed by their corresponding Cloud Router resources to advertise routes between the Distributed Cloud connected cluster and Google Cloud. If you modify the configuration of a Cloud Router resource associated with a Distributed Cloud VPN connection, that connection can stop functioning.

To recover the corrupt BGP session configuration in the affected Cloud Router, complete the following steps:

In the Google Cloud console, get the name of the corrupt BGP session. For example:
```
INTERFACE=anthos-mcc-34987234
```

Get the peer BGP and the Cloud Router BGP IP addresses for the corrupted BGP session, as well as the peer ASN used by the affected Distributed Cloud VPN connection. For example:

GDCE_BGP_IP=168.254.208.74
CLOUD_ROUTER_BGP_IP=168.254.208.73
PEER_ASN=65506

If you deleted the BGP session, get this information from the Distributed Cloud connected cluster instead:

Get the cluster credentials:
```
gcloud edge-cloud container clusters get-credentials CLUSTER_ID \
  --location REGION \
  --project PROJECT_ID
```
Replace the following:
- CLUSTER_ID: the name of the target cluster.
- REGION: the Google Cloud region in which the target cluster is created.
- PROJECT_ID: the ID of the target Google Cloud project.

Get the configuration of the MultiClusterConnectivityConfig resource:

kubectl get multiclusterconnectivityconfig -A

The command returns output similar to the following:

 NAMESPACE     NAME                   LOCAL ASN              PEER ASN
 kube-system   MultiClusterConfig1    65505                   65506
 ```

Get the peer BGP IP address, the Cloud Router IP address, and the BGP session ASN:

kubectl describe multiclusterconnectivityconfig -n kube-system MCC_CONFIG_NAME

Replace MCC_CONFIG_NAME with the name of the MultiClusterConfigResource that you obtained in the previous step.

The command returns output similar to the following:

 Spec:
 Asns:
   Peer:  65505
   Self:  65506 # GDCE ASN
 Tunnels:
   Ike Key:
     Name:       MCC_CONFIG_NAME-0
     Namespace:  kube-system
   Peer:
     Bgp IP:      169.254.208.73 # Cloud Router BGP IP
     Private IP:  34.157.98.148
     Public IP:   34.157.98.148
   Self:
     Bgp IP:      169.254.208.74 # GDCE BGP IP
     Private IP:  10.100.29.49
     Public IP:   208.117.254.68
 ```

In the Google Cloud console, get the name, region, and Google Cloud project name for the corrupted VPN tunnel. For example:
```
VPN_TUNNEL=VPNTunnel1
REGION=US-East1
VPC_PROJECT_ID=VPC-Project-1
```
Delete the corrupted BGP session from the Cloud Router configuration.
Create a new Cloud Router interface:
```
gcloud compute routers add-interface --interface-name=INTERFACE_NAME \
   --vpn-tunnel=TUNNEL_NAME \ 
   --ip-address=ROUTER_BGP_IP \
   --project=VPC_PROJECT_ID \
   --region=REGION \      
   --mask-length=30
```
Replace the following:
- INTERFACE_NAME: a descriptive name that uniquely identifies this interface.
- TUNNEL_NAME: the name of the VPN tunnel that you obtained in the previous step.
- ROUTER_BGP_IP: the BGP IP address of the Cloud Router that you obtained earlier in this procedure.
- VPC_PROJECT_ID: the ID of the target VPC Google Cloud project.
- REGION: the Google Cloud region in which the target VPC Google Cloud project has been created.
Create the BGP peer:
```
gcloud compute routers add-bgp-peer --interface=INTERFACE_NAME \
   --peer-name=TUNNEL_NAME \
   --region REGION \
   --project=VPC_PROJECT_ID \
   --peer-ip-address=GDCE_BGP_IP \
   --peer-asn=GDCE_BGP_ASN \
   --advertised-route-priority=100 \
   --advertisement-mode=DEFAULT
```
Replace the following:
- INTERFACE_NAME: the name of the interface that you created in the previous step.
- TUNNEL_NAME: the name of the VPN tunnel that you used to create the interface in the previous step.
- REGION: the Google Cloud region in which the target VPC Google Cloud project is created.
- VPC_PROJECT_ID: the ID of the target VPC Google Cloud project.
- GDCE_BGP_IP: the Distributed Cloud peer BGP IP address that you obtained earlier in this procedure.
- GDCE_BGP_ASN: the Distributed Cloud peer BGP ASN that you obtained earlier in this procedure.

At this point, the BGP session is back up and operational.

Node stuck in the `Ready,SchedulingDisabled` state

When you apply or delete the NodeSystemConfigUpdate or SriovNetworkNodePolicy resource, the target node might reboot. When a node reboots, its status changes to NotReady or Scheduling Disabled. If a node remains in a Ready,SchedulingDisabled state for more than 30 minutes, do the following:

Check the configuration and status of the corresponding NodeSystemConfigUpdate or SriovNetworkNodePolicy resource. If the SriovNetworkNodePolicy resource does not exist. the node is not capable of SR-IOV.
If the resource status is Succeeded, enable scheduling on the node using the following command:
```
kubectl uncordon NODE_NAME.
```
Replace NODE_NAME with the name of the target node.
If the issue persists, contact Google Support.

Troubleshoot Distributed Cloud connected

Corrupt BGP sessions in Cloud Router resources used by VPN connections

Node stuck in the Ready,SchedulingDisabled state

Node stuck in the `Ready,SchedulingDisabled` state