Problem
Environment
- Google Kubernetes Engine
- Shared Virtual Private Cloud
Solution
- Check if there is a higher priority deny rule, if so add an exception.
- If the Google Kubernetes Engine service project's service account (Service Robot) does not have enough permissions, the Shared-Virtual Private Cloud Admin can grant the service account a role with firewall admin permissions (create, delete, get, list) in the host project. A pre-defined securityAdmin role is available for this, but they can use custom roles too.
- Use the following filter in Host Project's Cloud Logging to find firewall creation issues:
resource.type="gce_firewall_rule" protoPayload.methodName="v1.compute.firewalls.insert" severity>=ERROR
Cause
If it works for Cluster but not for Local, then health checks are not being performed on the node, due to the firewall. Firewall creation is managed by Google Kubernetes Engine.
A service of type loadBalancer will create a Network Load Balancer. When ExternalTrafficPolicy is set to Cluster, the load balancer will be programmed to have all nodes as backends, and they will all be unhealthy. According to routing policy, traffic will be sent to a random backend node. This node will know which node has the wanted pod and will forward the request.
With ExternalTrafficPolicy set to Local, the service creates a HealthCheckNodePort. Nodes, that have pods belonging to that service, will respond to Health Checks on the HealthCheckNodePort. In doing so, these nodes will be considered healthy backends for the load balancer, and will be the recipient of traffic to the load balancer.
When health checks fail (for whatever reason) with ExternalTrafficPolicy set to Local, then a random node will be selected, and depending on the size of the cluster, chances are, it will hit a node without the correct pod, and the request will time out.