Bug 1987445

Summary: MetalLB integration: All gateway routers in the cluster answer ARP requests for LoadBalancer services IP
Product: OpenShift Container Platform Reporter: obraunsh
Component: NetworkingAssignee: obraunsh
Networking sub component: ovn-kubernetes QA Contact: Arti Sood <asood>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: akaris, anbhat, asood, cgoncalves, christoph.obexer, dcbw, fpaoline, gkopels, ibodunov, mapandey, mdonila, tidawson, trozet, vpickard
Version: 4.9   
Target Milestone: ---   
Target Release: 4.10.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of:
: 2014003 (view as bug list) Environment:
Last Closed: 2022-03-12 04:36:27 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2014003    
Attachments:
Description Flags
LoadBalancer Service points to nginx DaemonSet, is assigned IP 1.1.1.5 from MetalLB's pool none

Description obraunsh 2021-07-29 14:52:35 UTC
Created attachment 1807384 [details]
LoadBalancer Service points to nginx DaemonSet, is assigned IP 1.1.1.5 from MetalLB's pool

This was already discussed on Slack, filing the bz to revamp the discussion

Description of problem:
The way MetalLB works is by instructing the speaker pod running on the node with the pod assigned to the LoadBalancer IP to respond to ARP requests.

What is happening with OVNK is that when trying to probe for an IP assigned to a LoadBalancer Service, the ARP request gets multiple replies (from each node). 
We believe that this is because the ARP message finds its way through OVN on each node, hits the load balancer on each node's GR and gets the reply with its MAC address (which is the same as the node's NIC).

This seems to interfere with how MetalLB Layer2 mode expects these IPs to be announced - instead of one speaker pod replying to an ARP request, it is replied by n+1 MACs (the node with the speaker pod answers twice via br-ex and the GR).

Steps to Reproduce:
1. Deploy a DaemonSet/Deployment
2. Create a LoadBalancer Service for the set of pods, it gets assigned an IP by MetalLB.
3. arping that IP from an interface that can reach the cluster via L2

Actual results from local kind cluster running OVN:
arping -I br-da7d2a887a94 1.1.1.5 -c 1
ARPING 1.1.1.5 from 172.18.0.1 br-da7d2a887a94
Unicast reply from 1.1.1.5 [02:42:AC:12:00:04]  1.610ms
Unicast reply from 1.1.1.5 [02:42:AC:12:00:03]  1.741ms
Unicast reply from 1.1.1.5 [02:42:AC:12:00:02]  1.980ms
Unicast reply from 1.1.1.5 [02:42:AC:12:00:02]  2.452ms
Sent 1 probes (1 broadcast(s))
Received 4 response(s)

Expected results:
arping -I br-da7d2a887a94 1.1.1.5 -c 1
ARPING 1.1.1.5 from 172.18.0.1 br-da7d2a887a94
Unicast reply from 1.1.1.5 [02:42:AC:12:00:02]  2.452ms
Sent 1 probes (1 broadcast(s))
Received 1 response(s)


If this makes sense, I'd be happy to get the bug assigned to me.

Comment 1 Federico Paolinelli 2021-07-30 15:40:42 UTC
Adding as additional information, that this behavior might lead to policies blocking the arp traffic because it resembles arp-spoofing attacks (learned the term today :-) ).

Comment 2 Arti Sood 2021-08-31 18:33:03 UTC
I tested it on vSphere and BM cluster.

1. Created a service with   externalTrafficPolicy: Cluster as DaemonSet and ReplicaSet.
2. Create LoadBalancer type of service.

3. arping that IP from an interface that can reach the cluster via L2

Found only 1 response not all nodes responding.

 oc -n default rsh test-pod
sh-4.4# arping -I br-ex 10.0.96.171 -c 2
ARPING 10.0.96.171 from 10.0.99.1 br-ex
Unicast reply from 10.0.96.171 [FA:16:3E:49:14:4D]  2.900ms
Unicast reply from 10.0.96.171 [FA:16:3E:49:14:4D]  2.141ms
Sent 2 probes (1 broadcast(s))
Received 2 response(s)


Would it make a difference if it is a KIND cluster?

Comment 3 Federico Paolinelli 2021-08-31 19:41:27 UTC
IIRC this was found in a OCP cluster and reproduced in kind.
Adding Gregory who found this originally

Comment 4 Arti Sood 2021-08-31 20:03:18 UTC
 Used the following version.

oc version
Client Version: 4.7.0
Server Version: 4.9.0-0.nightly-2021-08-30-232019
Kubernetes Version: v1.22.0-rc.0+d08c23e

OVN version

ovn21.09-21.09.0-15.el8fdp.x86_64
ovn21.09-host-21.09.0-15.el8fdp.x86_64
ovn21.09-central-21.09.0-15.el8fdp.x86_64
ovn21.09-vtep-21.09.0-15.el8fdp.x86_64

Comment 5 Greg Kopels 2021-10-03 07:35:49 UTC
After speaking with Arti the only difference we can find between the two test executions is that Arti is using a cluster with OpenStack underlay compared to our lab which is a full bare metal cluster.  We are only seeing the arping issue on the bare metal cluster.

Comment 13 Greg Kopels 2021-11-18 12:43:26 UTC
I have verified the bug fix on version 4.10.0-0.nightly-2021-11-14-184249

      ARPING 10.46.56.131 from 10.46.56.14 br-ex
      Unicast reply from 10.46.56.131 [34:48:ED:F3:88:C4]  2.030ms
      Unicast reply from 10.46.56.131 [34:48:ED:F3:88:C4]  1.109ms
      Sent 2 probes (1 broadcast(s))
      Received 2 response(s)

After the fix now only the node advertising the service replies to the arp request.

Comment 24 errata-xmlrpc 2022-03-12 04:36:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056