Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 2078939

Summary: MetalLB Layer2: All secondary interfaces in the cluster answer ARP requests for LoadBalancer services IP
Product: OpenShift Container Platform Reporter: elevin
Component: NetworkingAssignee: Andreas Karis <akaris>
Networking sub component: ovn-kubernetes QA Contact: Anurag saxena <anusaxen>
Status: CLOSED DUPLICATE Docs Contact:
Severity: low    
Priority: low CC: ffernand, grajaiya, obraunsh
Version: 4.10Keywords: Reopened
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-05-06 19:19:46 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
topology none

Description elevin 2022-04-26 14:19:37 UTC
Created attachment 1875102 [details]
topology

Description of problem:


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 elevin 2022-04-26 14:38:55 UTC
Topology
OCP Hybrid cluster with 2 BM workers with secondary nics

worker1:
 1) server pod (nginx)

apiVersion: v1
kind: Pod
metadata:
 name: server
 namespace: default
 labels:
   app: nginx-local
spec:
 nodeSelector:
   kubernetes.io/hostname: "helix09.lab.eng.tlv2.redhat.com"       
 containers:
 - name: nginx
   image: docker-registry.upshift.redhat.com/cnf-gotests/cnf-gotests-client:v4.8
   imagePullPolicy: IfNotPresent
   securityContext:
     privileged: true
   command: ["/bin/sleep", "3650d"]
 
 2) Service for server pod

apiVersion: v1
kind: Service
metadata:
 name: nginx-local
 namespace: default
 annotations:
   metallb.universe.tf/address-pool: addresspool
spec:
 ports:
 - port: 80
   targetPort: 80
 selector:
   app: nginx-local
 type: LoadBalancer
 externalTrafficPolicy: Cluster

 3) metallb address pool with lb address 4.4.4.1

apiVersion: metallb.io/v1beta1
kind: AddressPool
metadata:
 name: addresspool
 namespace: metallb-system
 annotations:
   metallb.universe.tf/address-pool: addresspool 
spec:
 protocol: "layer2"
 autoAssign: true
 addresses:
   - 4.4.4.1/24

4) One of the node secondary nic configured with IP address 4.4.4.20/24

Worker2:
1) NAD with secondary interface

apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
 name: internal
spec:
 config: '{
   "cniVersion": "0.3.1",
   "name": "internal",
   "type": "macvlan",
   "master": "ens1f0",
   "mode": "bridge",
   "ipam": {
    "type": "static"
   }}'

2) Client pod with ip 4.4.4.10/24

Description of problem:
All secondary interfaces answer on arprequest to lb adress

Version-Release number of selected component (if applicable):
4.10

How reproducible:
100%


Actual results:

sh-4.4# arping -I net1 4.4.4.1
ARPING 4.4.4.1 from 4.4.4.10 net1
Unicast reply from 4.4.4.1 [0C:42:A1:BC:F7:B5]  0.712ms
Unicast reply from 4.4.4.1 [0C:42:A1:BC:F7:B4]  0.734ms
Unicast reply from 4.4.4.1 [B4:96:91:A5:79:D8]  0.747ms
Unicast reply from 4.4.4.1 [B4:96:91:A5:79:D9]  0.759ms
Unicast reply from 4.4.4.1 [40:A6:B7:37:0B:B0]  0.772ms
Unicast reply from 4.4.4.1 [40:A6:B7:37:0B:B1]  0.787ms
Unicast reply from 4.4.4.1 [40:A6:B7:37:0B:B1]  0.669ms
Unicast reply from 4.4.4.1 [40:A6:B7:37:0B:B1]  0.676ms
Unicast reply from 4.4.4.1 [40:A6:B7:37:0B:B1]  0.703ms
Unicast reply from 4.4.4.1 [40:A6:B7:37:0B:B1]  0.697ms
Unicast reply from 4.4.4.1 [40:A6:B7:37:0B:B1]  0.684ms
^CSent 6 probes (1 broadcast(s))
Received 11 response(s)
sh-4.4# arping -I net1 4.4.4.1
ARPING 4.4.4.1 from 4.4.4.10 net1
Unicast reply from 4.4.4.1 [0C:42:A1:BC:F7:B5]  0.722ms
Unicast reply from 4.4.4.1 [B4:96:91:A5:79:D8]  0.742ms
Unicast reply from 4.4.4.1 [40:A6:B7:37:0B:B0]  0.756ms
Unicast reply from 4.4.4.1 [40:A6:B7:37:0B:B1]  0.768ms
Unicast reply from 4.4.4.1 [0C:42:A1:BC:F7:B4]  0.781ms
Unicast reply from 4.4.4.1 [B4:96:91:A5:79:D9]  0.792ms
Unicast reply from 4.4.4.1 [B4:96:91:A5:79:D9]  0.663ms
Unicast reply from 4.4.4.1 [B4:96:91:A5:79:D9]  0.694ms
Unicast reply from 4.4.4.1 [B4:96:91:A5:79:D9]  0.673ms
^CSent 4 probes (1 broadcast(s))
Received 9 response(s)
sh-4.4# arping -I net1 4.4.4.1
ARPING 4.4.4.1 from 4.4.4.10 net1
Unicast reply from 4.4.4.1 [B4:96:91:A5:79:D8]  0.712ms
Unicast reply from 4.4.4.1 [40:A6:B7:37:0B:B1]  0.733ms
Unicast reply from 4.4.4.1 [40:A6:B7:37:0B:B0]  0.748ms
Unicast reply from 4.4.4.1 [B4:96:91:A5:79:D9]  0.760ms
Unicast reply from 4.4.4.1 [0C:42:A1:BC:F7:B5]  0.773ms
Unicast reply from 4.4.4.1 [0C:42:A1:BC:F7:B4]  0.785ms
Unicast reply from 4.4.4.1 [0C:42:A1:BC:F7:B4]  0.671ms

Expected results:
sh-4.4# arping -I net1 4.4.4.1
ARPING 4.4.4.1 from 4.4.4.10 net1
Unicast reply from 4.4.4.1 [0C:42:A1:BC:F7:B4]  0.785ms
Unicast reply from 4.4.4.1 [0C:42:A1:BC:F7:B4]  0.671ms

Additional info:

Comment 3 elevin 2022-04-26 14:39:25 UTC
Could be related to https://bugzilla.redhat.com/show_bug.cgi?id=1987445

Comment 4 Andreas Karis 2022-04-27 14:28:56 UTC
The aforementioned fix is in 4.10 for a while already:
~~~
[akaris@linux ovn-kubernetes ((1a1f1cb28...))]$ git checkout downstream/release-4.10
HEAD is now at 1a1f1cb28 Merge pull request #1043 from JacobTanenbaum/release-4.10-BZ2074839
[akaris@linux ovn-kubernetes ((1a1f1cb28...))]$ git log --oneline | grep 91d37a667
91d37a667 Neighbor solicitations and ARP requests used to hit all 3 OVN load-balancers in addition to the node local IP for ExternalIP. ARP requests or IPv6 NS would receive <node number + 1> replies.
~~~

What version of OCP are you on?

Can you attach an `oc adm must-gather` for the environment + a sosreport from the 2 baremetal workers  (while the issue is reproducible). 

Also, an inspect of namespace default and an inspect of namespace metallb-system:
~~~
namespace=default
exclude_list="secrets"
oc adm inspect -n $namespace $(oc api-resources --verbs=get,list --namespaced=true | tail -n+2 | egrep -v "$exclude_list" | awk '{print $1}' | tr '\n' ',' | sed 's/,$//')
~~~

~~~
namespace=metallb-system
exclude_list="secrets"
oc adm inspect -n $namespace $(oc api-resources --verbs=get,list --namespaced=true | tail -n+2 | egrep -v "$exclude_list" | awk '{print $1}' | tr '\n' ',' | sed 's/,$//')
~~~

Without any further data, I speculate that this could have something to do with the default weak host model that's applied by Linux considering that you are saying that the *secondary* interfaces answer requests: https://serverfault.com/questions/834512/why-does-linux-answer-to-arp-on-incorrect-interfaces

Comment 8 Andreas Karis 2022-05-03 10:29:00 UTC
Those are indeed the secondary interfaces:
~~~
[akaris@linux sosreport-helix09-2078939-2022-04-27-sijwcld]$ egrep -RiI '0C:42:A1:BC:F7:B5|0C:42:A1:BC:F7:B4|B4:96:91:A5:79:D8|B4:96:91:A5:79:D9|40:A6:B7:37:0B:B0|40:A6:B7:37:0B:B1' sos_commands/networking/ip_-d_address -B1
5: ens5f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether b4:96:91:a5:79:d8 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 9702 numtxqueues 80 numrxqueues 80 gso_max_size 65536 gso_max_segs 65535 
6: ens1f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 40:a6:b7:37:0b:b0 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 9702 numtxqueues 80 numrxqueues 80 gso_max_size 65536 gso_max_segs 65535 portid 40a6b7370bb0 
--
8: ens5f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether b4:96:91:a5:79:d9 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 9702 numtxqueues 80 numrxqueues 80 gso_max_size 65536 gso_max_segs 65535 
9: ens1f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 40:a6:b7:37:0b:b1 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 9702 numtxqueues 80 numrxqueues 80 gso_max_size 65536 gso_max_segs 65535 portid 40a6b7370bb1 
10: ens8f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 0c:42:a1:bc:f7:b4 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 9978 numtxqueues 504 numrxqueues 126 gso_max_size 65536 gso_max_segs 65535 
--
12: con1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 0c:42:a1:bc:f7:b5 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 9978 numtxqueues 504 numrxqueues 126 gso_max_size 65536 gso_max_segs 65535
~~~

br-ex is set up on eno1:
~~~
    Bridge br-ex
        Port patch-br-ex_helix09.lab.eng.tlv2.redhat.com-to-br-int
            Interface patch-br-ex_helix09.lab.eng.tlv2.redhat.com-to-br-int
                type: patch
                options: {peer=patch-br-int-to-br-ex_helix09.lab.eng.tlv2.redhat.com}
        Port br-ex
            Interface br-ex
                type: internal
        Port eno1
            Interface eno1
                type: system
    ovs_version: "2.15.4"
~~~

0c:42:a1:bc:f7:b4 is on ens8f0 (why would you expect an answer only here?).

I looked at the metallb logs and I see this here:
~~~
2022-04-27T19:39:05.934942354+00:00 stdout F {"caller":"level.go:63","event":"createARPResponder","interface":"eno2","level":"info","msg":"created ARP responder for interface","ts":"2022-04-27T19:39:05.934808813Z"}
2022-04-27T19:39:05.935304164+00:00 stdout F {"caller":"level.go:63","event":"createARPResponder","interface":"eno3","level":"info","msg":"created ARP responder for interface","ts":"2022-04-27T19:39:05.935256781Z"}
2022-04-27T19:39:05.935663299+00:00 stdout F {"caller":"level.go:63","event":"createARPResponder","interface":"ens5f0","level":"info","msg":"created ARP responder for interface","ts":"2022-04-27T19:39:05.935622792Z"}
2022-04-27T19:39:05.936016953+00:00 stdout F {"caller":"level.go:63","event":"createARPResponder","interface":"ens1f0","level":"info","msg":"created ARP responder for interface","ts":"2022-04-27T19:39:05.935976391Z"}
2022-04-27T19:39:05.936391798+00:00 stdout F {"caller":"level.go:63","event":"createARPResponder","interface":"eno4","level":"info","msg":"created ARP responder for interface","ts":"2022-04-27T19:39:05.936354022Z"}
2022-04-27T19:39:05.936736370+00:00 stdout F {"caller":"level.go:63","event":"createARPResponder","interface":"ens5f1","level":"info","msg":"created ARP responder for interface","ts":"2022-04-27T19:39:05.936699128Z"}
2022-04-27T19:39:05.937083260+00:00 stdout F {"caller":"level.go:63","event":"createARPResponder","interface":"ens1f1","level":"info","msg":"created ARP responder for interface","ts":"2022-04-27T19:39:05.937044345Z"}
2022-04-27T19:39:05.937437652+00:00 stdout F {"caller":"level.go:63","event":"createARPResponder","interface":"ens8f0","level":"info","msg":"created ARP responder for interface","ts":"2022-04-27T19:39:05.93739996Z"}
2022-04-27T19:39:05.937804846+00:00 stdout F {"caller":"level.go:63","event":"createARPResponder","interface":"ens5f2","level":"info","msg":"created ARP responder for interface","ts":"2022-04-27T19:39:05.937767432Z"}
2022-04-27T19:39:05.938228451+00:00 stdout F {"caller":"level.go:63","event":"createARPResponder","interface":"con1","level":"info","msg":"created ARP responder for interface","ts":"2022-04-27T19:39:05.938188923Z"}
2022-04-27T19:39:05.938619744+00:00 stdout F {"caller":"level.go:63","event":"createARPResponder","interface":"ens5f3","level":"info","msg":"created ARP responder for interface","ts":"2022-04-27T19:39:05.938578436Z"}
2022-04-27T19:39:05.939277336+00:00 stdout F {"caller":"level.go:63","event":"createARPResponder","interface":"ovn-k8s-mp0","level":"info","msg":"created ARP responder for interface","ts":"2022-04-27T19:39:05.939236087Z"}
2022-04-27T19:39:05.941761515+00:00 stdout F {"caller":"level.go:63","event":"createARPResponder","interface":"br-ex","level":"info","msg":"created ARP responder for interface","ts":"2022-04-27T19:39:05.941659093Z"}
2022-04-27T19:46:45.658708094+00:00 stdout F {"caller":"level.go:63","event":"serviceAnnounced","ips":["4.4.4.1"],"level":"info","msg":"service has IP, announcing","pool":"addresspool","protocol":"layer2","ts":"2022-04-27T19:46:45.658580732Z"}
2022-04-27T19:46:53.817697365+00:00 stdout F {"caller":"level.go:63","event":"serviceAnnounced","ips":["4.4.4.1"],"level":"info","msg":"service has IP, announcing","pool":"addresspool","protocol":"layer2","ts":"2022-04-27T19:46:53.817628726Z"}
2022-04-27T19:47:00.418073855+00:00 stdout F {"caller":"level.go:63","event":"serviceAnnounced","ips":["4.4.4.1"],"level":"info","msg":"service has IP, announcing","pool":"addresspool","protocol":"layer2","ts":"2022-04-27T19:47:00.41800957Z"}
2022-04-27T19:48:18.158371909+00:00 stdout F {"caller":"level.go:63","event":"serviceAnnounced","ips":["4.4.4.1"],"level":"info","msg":"service has IP, announcing","pool":"addresspool","protocol":"layer2","ts":"2022-04-27T19:48:18.158305844Z"}
2022-04-27T19:48:51.245370835+00:00 stdout F {"caller":"level.go:63","event":"serviceAnnounced","ips":["4.4.4.1"],"level":"info","msg":"service has IP, announcing","pool":"addresspool","protocol":"layer2","ts":"2022-04-27T19:48:51.245309699Z"}
2022-04-27T19:49:17.437141112+00:00 stdout F {"caller":"level.go:63","event":"serviceAnnounced","ips":["4.4.4.1"],"level":"info","msg":"service has IP, announcing","pool":"addresspool","protocol":"layer2","ts":"2022-04-27T19:49:17.437068643Z"}
2022-04-27T19:49:27.621727590+00:00 stdout F {"caller":"level.go:63","event":"serviceAnnounced","ips":["4.4.4.1"],"level":"info","msg":"service has IP, announcing","pool":"addresspool","protocol":"layer2","ts":"2022-04-27T19:49:27.621665703Z"}
2022-04-27T19:49:32.149888414+00:00 stdout F {"caller":"level.go:63","event":"serviceAnnounced","ips":["4.4.4.1"],"level":"info","msg":"service has IP, announcing","pool":"addresspool","protocol":"layer2","ts":"2022-04-27T19:49:32.149812064Z"}
2022-04-27T19:49:38.598366388+00:00 stdout F {"caller":"level.go:63","event":"serviceAnnounced","ips":["4.4.4.1"],"level":"info","msg":"service has IP, announcing","pool":"addresspool","protocol":"layer2","ts":"2022-04-27T19:49:38.598299614Z"}
2022-04-27T19:52:00.553881698+00:00 stdout F {"caller":"level.go:63","event":"serviceAnnounced","ips":["4.4.4.1"],"level":"info","msg":"service has IP, announcing","pool":"addresspool","protocol":"layer2","ts":"2022-04-27T19:52:00.553806527Z"}
2022-04-27T19:52:07.376234142+00:00 stdout F {"caller":"level.go:63","event":"serviceAnnounced","ips":["4.4.4.1"],"level":"info","msg":"service has IP, announcing","pool":"addresspool","protocol":"layer2","ts":"2022-04-27T19:52:07.376169965Z"}
2022-04-27T19:52:51.798253643+00:00 stdout F {"caller":"level.go:63","event":"serviceAnnounced","ips":["4.4.4.1"],"level":"info","msg":"service has IP, announcing","pool":"addresspool","protocol":"layer2","ts":"2022-04-27T19:52:51.798191904Z"}
2022-04-27T19:53:24.686169310+00:00 stdout F {"caller":"level.go:63","event":"serviceAnnounced","ips":["4.4.4.1"],"level":"info","msg":"service has IP, announcing","pool":"addresspool","protocol":"layer2","ts":"2022-04-27T19:53:24.68611128Z"}
2022-04-27T19:54:34.566718170+00:00 stdout F {"caller":"level.go:63","event":"serviceAnnounced","ips":["4.4.4.1"],"level":"info","msg":"service has IP, announcing","pool":"addresspool","protocol":"layer2","ts":"2022-04-27T19:54:34.566654086Z"}
2022-04-27T19:54:37.724914387+00:00 stdout F {"caller":"level.go:63","event":"serviceAnnounced","ips":["4.4.4.1"],"level":"info","msg":"service has IP, announcing","pool":"addresspool","protocol":"layer2","ts":"2022-04-27T19:54:37.72486047Z"}
2022-04-27T19:56:02.135702137+00:00 stdout F {"caller":"level.go:63","event":"serviceAnnounced","ips":["4.4.4.1"],"level":"info","msg":"service has IP, announcing","pool":"addresspool","protocol":"layer2","ts":"2022-04-27T19:56:02.135624037Z"}
2022-04-27T19:56:22.758102195+00:00 stdout F {"caller":"level.go:63","event":"serviceAnnounced","ips":["4.4.4.1"],"level":"info","msg":"service has IP, announcing","pool":"addresspool","protocol":"layer2","ts":"2022-04-27T19:56:22.758036081Z"}
2022-04-27T19:57:59.014697502+00:00 stdout F {"caller":"level.go:63","event":"serviceAnnounced","ips":["4.4.4.1"],"level":"info","msg":"service has IP, announcing","pool":"addresspool","protocol":"layer2","ts":"2022-04-27T19:57:59.01463786Z"}
2022-04-27T19:58:32.185455811+00:00 stdout F {"caller":"level.go:63","event":"serviceAnnounced","ips":["4.4.4.1"],"level":"info","msg":"service has IP, announcing","pool":"addresspool","protocol":"layer2","ts":"2022-04-27T19:58:32.185379151Z"}
~~~

I suppose that would explain your behavior.

Comment 9 Andreas Karis 2022-05-03 10:31:14 UTC
It's also part of the metallb documentation. It's just the way how metallb works:

https://metallb.universe.tf/faq/#in-layer-2-mode-how-to-specify-the-host-interface-for-an-address-pool

In layer 2 mode, how to specify the host interface for an address pool?

There’s no need: MetalLB automatically listens/advertises on all interfaces. That might sound like a problem, but because of the way ARP/NDP works, only clients on the right network will know to look for the service IP on the network.

NOTE Because of the way layer 2 mode functions, this works with tagged vlans as well. Specify the network and the ip stack figures out the rest.

Comment 10 Andreas Karis 2022-05-03 10:32:49 UTC
So I'd recommend moving the secondary interfaces onto different VLANs, or shutting them down on the switch (or on the servers).

Comment 11 elevin 2022-05-04 08:44:30 UTC
1) In the test, I use the secondary interface ens1f0, which is already vlan isolated  from other secondary interfaces.
2) For every ARP request, I get different ARP responses. So the traffic doesn't work

sh-4.4# ip neigh | grep 4.4.4.1
4.4.4.1 dev net1 lladdr 0c:42:a1:bc:f7:b0 REACHABLE
sh-4.4# arping -I net1 4.4.4.1 -c2
ARPING 4.4.4.1 from 4.4.4.10 net1
Unicast reply from 4.4.4.1 [40:A6:B7:38:B4:E1]  0.726ms
Unicast reply from 4.4.4.1 [40:A6:B7:38:B4:E0]  0.773ms
Unicast reply from 4.4.4.1 [B4:96:91:A5:9D:E0]  0.786ms
Unicast reply from 4.4.4.1 [0C:42:A1:BC:F7:B0]  0.798ms
Unicast reply from 4.4.4.1 [B4:96:91:A5:9D:E1]  0.810ms
Unicast reply from 4.4.4.1 [0C:42:A1:BC:F7:B1]  0.821ms
Unicast reply from 4.4.4.1 [0C:42:A1:BC:F7:B1]  0.694ms
Sent 2 probes (1 broadcast(s))
Received 7 response(s)
sh-4.4# ip neigh | grep 4.4.4.1
4.4.4.1 dev net1 lladdr 40:a6:b7:38:b4:e1 REACHABLE

sh-4.4# arping -I net1 4.4.4.1 -c2
ARPING 4.4.4.1 from 4.4.4.10 net1
Unicast reply from 4.4.4.1 [B4:96:91:A5:9D:E0]  0.695ms
Unicast reply from 4.4.4.1 [40:A6:B7:38:B4:E1]  0.735ms
Unicast reply from 4.4.4.1 [0C:42:A1:BC:F7:B0]  0.747ms
Unicast reply from 4.4.4.1 [0C:42:A1:BC:F7:B1]  0.757ms
Unicast reply from 4.4.4.1 [B4:96:91:A5:9D:E1]  0.781ms
Unicast reply from 4.4.4.1 [40:A6:B7:38:B4:E0]  0.792ms
Unicast reply from 4.4.4.1 [40:A6:B7:38:B4:E0]  0.683ms
Sent 2 probes (1 broadcast(s))
Received 7 response(s)
sh-4.4# ip neigh | grep 4.4.4.1
4.4.4.1 dev net1 lladdr b4:96:91:a5:9d:e0 REACHABLE

Comment 12 Andreas Karis 2022-05-04 11:24:46 UTC
2) You are getting ARP replies for every secondary interface on every one of your nodes that is running metallbl because metallb is listening on all interfaces on the node. If an ARP request makes it to any of those interfaces, then metallb will answer out all of your secondary interfaces.

I don't know for your last example, as I cannot find those mac addresses in the sosreports that you gave me, so I suppose that they are from a different environment  or at least from different nodes:
~~~
[akaris@linux 2078939]$ grep -i B4:96:91:A5:9D:E0 -RlI 2>/dev/null
[akaris@linux 2078939]$ 
~~~

1) If you are getting ARP responses from the mac addresses of the secondary interfaces, then your interfaces are very likely not isolated - can you run a tcpdump on those secondary interfaces on the worker nodes while running an ARPING? My guess is that you will see the arp request / response on those secondary interfaces.

See my last reply. In that case, the node's interfaces with the MAC addresses in question:
~~~
[akaris@linux sosreport-helix09-2078939-2022-04-27-sijwcld]$ egrep -RiI '0C:42:A1:BC:F7:B5|0C:42:A1:BC:F7:B4|B4:96:91:A5:79:D8|B4:96:91:A5:79:D9|40:A6:B7:37:0B:B0|40:A6:B7:37:0B:B1' sos_commands/networking/ip_-d_address -B1
5: ens5f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether b4:96:91:a5:79:d8 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 9702 numtxqueues 80 numrxqueues 80 gso_max_size 65536 gso_max_segs 65535 
6: ens1f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 40:a6:b7:37:0b:b0 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 9702 numtxqueues 80 numrxqueues 80 gso_max_size 65536 gso_max_segs 65535 portid 40a6b7370bb0 
--
8: ens5f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether b4:96:91:a5:79:d9 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 9702 numtxqueues 80 numrxqueues 80 gso_max_size 65536 gso_max_segs 65535 
9: ens1f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 40:a6:b7:37:0b:b1 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 9702 numtxqueues 80 numrxqueues 80 gso_max_size 65536 gso_max_segs 65535 portid 40a6b7370bb1 
10: ens8f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 0c:42:a1:bc:f7:b4 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 9978 numtxqueues 504 numrxqueues 126 gso_max_size 65536 gso_max_segs 65535 
--
12: con1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 0c:42:a1:bc:f7:b5 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 9978 numtxqueues 504 numrxqueues 126 gso_max_size 65536 gso_max_segs 65535
~~~

You can see in the metallb logs that it sets up an ARP responder for each of those interfaces:
~~~
2022-04-27T19:39:05.934942354+00:00 stdout F {"caller":"level.go:63","event":"createARPResponder","interface":"eno2","level":"info","msg":"created ARP responder for interface","ts":"2022-04-27T19:39:05.934808813Z"}
2022-04-27T19:39:05.935304164+00:00 stdout F {"caller":"level.go:63","event":"createARPResponder","interface":"eno3","level":"info","msg":"created ARP responder for interface","ts":"2022-04-27T19:39:05.935256781Z"}
2022-04-27T19:39:05.935663299+00:00 stdout F {"caller":"level.go:63","event":"createARPResponder","interface":"ens5f0","level":"info","msg":"created ARP responder for interface","ts":"2022-04-27T19:39:05.935622792Z"}
2022-04-27T19:39:05.936016953+00:00 stdout F {"caller":"level.go:63","event":"createARPResponder","interface":"ens1f0","level":"info","msg":"created ARP responder for interface","ts":"2022-04-27T19:39:05.935976391Z"}
2022-04-27T19:39:05.936391798+00:00 stdout F {"caller":"level.go:63","event":"createARPResponder","interface":"eno4","level":"info","msg":"created ARP responder for interface","ts":"2022-04-27T19:39:05.936354022Z"}
2022-04-27T19:39:05.936736370+00:00 stdout F {"caller":"level.go:63","event":"createARPResponder","interface":"ens5f1","level":"info","msg":"created ARP responder for interface","ts":"2022-04-27T19:39:05.936699128Z"}
2022-04-27T19:39:05.937083260+00:00 stdout F {"caller":"level.go:63","event":"createARPResponder","interface":"ens1f1","level":"info","msg":"created ARP responder for interface","ts":"2022-04-27T19:39:05.937044345Z"}
2022-04-27T19:39:05.937437652+00:00 stdout F {"caller":"level.go:63","event":"createARPResponder","interface":"ens8f0","level":"info","msg":"created ARP responder for interface","ts":"2022-04-27T19:39:05.93739996Z"}
2022-04-27T19:39:05.937804846+00:00 stdout F {"caller":"level.go:63","event":"createARPResponder","interface":"ens5f2","level":"info","msg":"created ARP responder for interface","ts":"2022-04-27T19:39:05.937767432Z"}
2022-04-27T19:39:05.938228451+00:00 stdout F {"caller":"level.go:63","event":"createARPResponder","interface":"con1","level":"info","msg":"created ARP responder for interface","ts":"2022-04-27T19:39:05.938188923Z"}
2022-04-27T19:39:05.938619744+00:00 stdout F {"caller":"level.go:63","event":"createARPResponder","interface":"ens5f3","level":"info","msg":"created ARP responder for interface","ts":"2022-04-27T19:39:05.938578436Z"}
2022-04-27T19:39:05.939277336+00:00 stdout F {"caller":"level.go:63","event":"createARPResponder","interface":"ovn-k8s-mp0","level":"info","msg":"created ARP responder for interface","ts":"2022-04-27T19:39:05.939236087Z"}
2022-04-27T19:39:05.941761515+00:00 stdout F {"caller":"level.go:63","event":"createARPResponder","interface":"br-ex","level":"info","msg":"created ARP responder for interface","ts":"2022-04-27T19:39:05.941659093Z"}
~~~

And that matches your ARPING results (the mac addresses are all within the set 0C:42:A1:BC:F7:B5|0C:42:A1:BC:F7:B4|B4:96:91:A5:79:D8|B4:96:91:A5:79:D9|40:A6:B7:37:0B:B0|40:A6:B7:37:0B:B1)
~~~
sh-4.4# arping -I net1 4.4.4.1
ARPING 4.4.4.1 from 4.4.4.10 net1
Unicast reply from 4.4.4.1 [0C:42:A1:BC:F7:B5]  0.712ms
Unicast reply from 4.4.4.1 [0C:42:A1:BC:F7:B4]  0.734ms
Unicast reply from 4.4.4.1 [B4:96:91:A5:79:D8]  0.747ms
Unicast reply from 4.4.4.1 [B4:96:91:A5:79:D9]  0.759ms
Unicast reply from 4.4.4.1 [40:A6:B7:37:0B:B0]  0.772ms
Unicast reply from 4.4.4.1 [40:A6:B7:37:0B:B1]  0.787ms
Unicast reply from 4.4.4.1 [40:A6:B7:37:0B:B1]  0.669ms
Unicast reply from 4.4.4.1 [40:A6:B7:37:0B:B1]  0.676ms
Unicast reply from 4.4.4.1 [40:A6:B7:37:0B:B1]  0.703ms
Unicast reply from 4.4.4.1 [40:A6:B7:37:0B:B1]  0.697ms
Unicast reply from 4.4.4.1 [40:A6:B7:37:0B:B1]  0.684ms
^CSent 6 probes (1 broadcast(s))
Received 11 response(s)
sh-4.4# arping -I net1 4.4.4.1
ARPING 4.4.4.1 from 4.4.4.10 net1
Unicast reply from 4.4.4.1 [0C:42:A1:BC:F7:B5]  0.722ms
Unicast reply from 4.4.4.1 [B4:96:91:A5:79:D8]  0.742ms
Unicast reply from 4.4.4.1 [40:A6:B7:37:0B:B0]  0.756ms
Unicast reply from 4.4.4.1 [40:A6:B7:37:0B:B1]  0.768ms
Unicast reply from 4.4.4.1 [0C:42:A1:BC:F7:B4]  0.781ms
Unicast reply from 4.4.4.1 [B4:96:91:A5:79:D9]  0.792ms
Unicast reply from 4.4.4.1 [B4:96:91:A5:79:D9]  0.663ms
Unicast reply from 4.4.4.1 [B4:96:91:A5:79:D9]  0.694ms
Unicast reply from 4.4.4.1 [B4:96:91:A5:79:D9]  0.673ms
^CSent 4 probes (1 broadcast(s))
Received 9 response(s)
sh-4.4# arping -I net1 4.4.4.1
ARPING 4.4.4.1 from 4.4.4.10 net1
Unicast reply from 4.4.4.1 [B4:96:91:A5:79:D8]  0.712ms
Unicast reply from 4.4.4.1 [40:A6:B7:37:0B:B1]  0.733ms
Unicast reply from 4.4.4.1 [40:A6:B7:37:0B:B0]  0.748ms
Unicast reply from 4.4.4.1 [B4:96:91:A5:79:D9]  0.760ms
Unicast reply from 4.4.4.1 [0C:42:A1:BC:F7:B5]  0.773ms
Unicast reply from 4.4.4.1 [0C:42:A1:BC:F7:B4]  0.785ms
Unicast reply from 4.4.4.1 [0C:42:A1:BC:F7:B4]  0.671ms
~~~


My guess is that this happens (simplified):

source host sends arp request to 4.4.4.1   ----> |                                                                              | ------> source host receives arp reply to 4.4.4.1 from ens1f0, ens1f1, ens5f0
                                                 |----> ens1f0 (the request reaches ens1f0)                                     |
                                                        ----> metallb's ARPResponder answers 4.4.4.1 is at mac of ens1f0  ----> |
                                                 |----> ens1f1 (the request reaches ens1f1)                                     |
                                                        ----> metallb's ARPResponder answers 4.4.4.1 is at mac of ens1f1  ----> |
                                                 |----> ens5f1 (the request reaches ens5f1)                                     | 
                                                        ----> metallb's ARPResponder answers 4.4.4.1 is at mac of ens5f1  ----> |

Comment 13 Andreas Karis 2022-05-04 11:28:18 UTC
If what I described in my earlier comment happens (ARP requests reaching each of those interfaces, to be verified with tcpdump), then this works as designed, see:
https://bugzilla.redhat.com/show_bug.cgi?id=2078939#c9

Comment 14 Andreas Karis 2022-05-04 12:43:12 UTC
When you create a service with a Loadbalancer IP address, then the metallb controller assigns the service and a single speaker will take ownership of the IP and start announcing it.

The following output is from my lab):
~~~
[root@openshift-jumpserver-0 ~]# oc get svc
NAME               TYPE           CLUSTER-IP      EXTERNAL-IP      PORT(S)        AGE
nginx-deployment   LoadBalancer   172.30.54.232   192.168.123.90   80:32400/TCP   9m4s
~~~

~~~
[root@openshift-jumpserver-0 metallb]# for p in $(oc get pods -n metallb-system -o name) ; do echo === $p ===;  oc logs -n metallb-system $p | grep 192.168.123.90; done
=== pod/controller-7b95d98fb4-x2lpx ===
{"caller":"service.go:114","event":"ipAllocated","ip":"192.168.123.90","msg":"IP address assigned by controller","service":"network-test/nginx-deployment","ts":"2022-05-04T11:35:37.329850909Z"}
{"caller":"service.go:114","event":"ipAllocated","ip":"192.168.123.90","msg":"IP address assigned by controller","service":"nginx/nginx-deployment","ts":"2022-05-04T11:40:30.57020712Z"}
=== pod/metallb-operator-controller-manager-9455695f4-z99kw ===
=== pod/speaker-449ws ===
{"caller":"main.go:287","event":"serviceAnnounced","ip":"192.168.123.90","msg":"service has IP, announcing","pool":"doc-example","protocol":"layer2","service":"network-test/nginx-deployment","ts":"2022-05-04T11:37:36.089422829Z"}
{"caller":"main.go:287","event":"serviceAnnounced","ip":"192.168.123.90","msg":"service has IP, announcing","pool":"doc-example","protocol":"layer2","service":"network-test/nginx-deployment","ts":"2022-05-04T11:37:36.132314812Z"}
{"caller":"main.go:287","event":"serviceAnnounced","ip":"192.168.123.90","msg":"service has IP, announcing","pool":"doc-example","protocol":"layer2","service":"network-test/nginx-deployment","ts":"2022-05-04T11:37:36.17778973Z"}
{"caller":"main.go:287","event":"serviceAnnounced","ip":"192.168.123.90","msg":"service has IP, announcing","pool":"doc-example","protocol":"layer2","service":"network-test/nginx-deployment","ts":"2022-05-04T11:37:37.091376743Z"}
{"caller":"main.go:287","event":"serviceAnnounced","ip":"192.168.123.90","msg":"service has IP, announcing","pool":"doc-example","protocol":"layer2","service":"network-test/nginx-deployment","ts":"2022-05-04T11:37:37.100526647Z"}
{"caller":"main.go:287","event":"serviceAnnounced","ip":"192.168.123.90","msg":"service has IP, announcing","pool":"doc-example","protocol":"layer2","service":"network-test/nginx-deployment","ts":"2022-05-04T11:38:50.266332266Z"}
{"caller":"main.go:287","event":"serviceAnnounced","ip":"192.168.123.90","msg":"service has IP, announcing","pool":"doc-example","protocol":"layer2","service":"nginx/nginx-deployment","ts":"2022-05-04T11:40:42.380517446Z"}
{"caller":"main.go:287","event":"serviceAnnounced","ip":"192.168.123.90","msg":"service has IP, announcing","pool":"doc-example","protocol":"layer2","service":"nginx/nginx-deployment","ts":"2022-05-04T11:40:42.539531834Z"}
{"caller":"main.go:287","event":"serviceAnnounced","ip":"192.168.123.90","msg":"service has IP, announcing","pool":"doc-example","protocol":"layer2","service":"nginx/nginx-deployment","ts":"2022-05-04T11:40:42.587976911Z"}
{"caller":"arp.go:102","interface":"br-ex","ip":"192.168.123.90","msg":"got ARP request for service IP, sending response","responseMAC":"52:54:00:00:00:b3","senderIP":"192.168.123.1","senderMAC":"52:54:00:00:00:b1","ts":"2022-05-04T11:41:39.723431292Z"}
=== pod/speaker-5hzsc ===
=== pod/speaker-h75vr ===
=== pod/speaker-kwbqf ===
=== pod/speaker-ll2wg ===
=== pod/speaker-zlnvn ===
~~~

Another node will take over as part of the failover mechanism should something happen to that speaker process (e.g., I rebooted master 0 that hosted speaker-449ws  and that caused a failover to the worker node):
~~~
[root@openshift-jumpserver-0 metallb]# for p in $(oc get pods -n metallb-system -o name) ; do echo === $p ===;  oc logs -n metallb-system $p | grep 192.168.123.90; done
=== pod/controller-7b95d98fb4-x2lpx ===
{"caller":"service.go:114","event":"ipAllocated","ip":"192.168.123.90","msg":"IP address assigned by controller","service":"network-test/nginx-deployment","ts":"2022-05-04T11:35:37.329850909Z"}
{"caller":"service.go:114","event":"ipAllocated","ip":"192.168.123.90","msg":"IP address assigned by controller","service":"nginx/nginx-deployment","ts":"2022-05-04T11:40:30.57020712Z"}
=== pod/metallb-operator-controller-manager-9455695f4-z99kw ===
=== pod/speaker-5hzsc ===
(...)
{"caller":"main.go:287","event":"serviceAnnounced","ip":"192.168.123.90","msg":"service has IP, announcing","pool":"doc-example","protocol":"layer2","service":"nginx/nginx-deployment","ts":"2022-05-04T12:04:20.963247082Z"}
{"caller":"arp.go:102","interface":"enp5s0f0","ip":"192.168.123.90","msg":"got ARP request for service IP, sending response","responseMAC":"f8:f2:1e:83:16:c0","senderIP":"192.168.123.90","senderMAC":"18:66:da:9f:c6:6b","ts":"2022-05-04T12:04:20.963793578Z"}
{"caller":"arp.go:102","interface":"enp5s0f1","ip":"192.168.123.90","msg":"got ARP request for service IP, sending response","responseMAC":"f8:f2:1e:83:16:c1","senderIP":"192.168.123.90","senderMAC":"18:66:da:9f:c6:6b","ts":"2022-05-04T12:04:20.96382165Z"}
{"caller":"arp.go:102","interface":"enp5s0f1","ip":"192.168.123.90","msg":"got ARP request for service IP, sending response","responseMAC":"f8:f2:1e:83:16:c1","senderIP":"192.168.123.90","senderMAC":"18:66:da:9f:c6:6b","ts":"2022-05-04T12:04:22.064162433Z"}
{"caller":"arp.go:102","interface":"enp5s0f0","ip":"192.168.123.90","msg":"got ARP request for service IP, sending response","responseMAC":"f8:f2:1e:83:16:c0","senderIP":"192.168.123.90","senderMAC":"18:66:da:9f:c6:6b","ts":"2022-05-04T12:04:22.064162502Z"}
{"caller":"arp.go:102","interface":"enp5s0f1","ip":"192.168.123.90","msg":"got ARP request for service IP, sending response","responseMAC":"f8:f2:1e:83:16:c1","senderIP":"192.168.123.90","senderMAC":"18:66:da:9f:c6:6b","ts":"2022-05-04T12:04:23.164535957Z"}
{"caller":"arp.go:102","interface":"enp5s0f0","ip":"192.168.123.90","msg":"got ARP request for service IP, sending response","responseMAC":"f8:f2:1e:83:16:c0","senderIP":"192.168.123.90","senderMAC":"18:66:da:9f:c6:6b","ts":"2022-05-04T12:04:23.164535942Z"}
{"caller":"arp.go:102","interface":"enp5s0f0","ip":"192.168.123.90","msg":"got ARP request for service IP, sending response","responseMAC":"f8:f2:1e:83:16:c0","senderIP":"192.168.123.90","senderMAC":"18:66:da:9f:c6:6b","ts":"2022-05-04T12:04:24.263682008Z"}
{"caller":"arp.go:102","interface":"enp5s0f1","ip":"192.168.123.90","msg":"got ARP request for service IP, sending response","responseMAC":"f8:f2:1e:83:16:c1","senderIP":"192.168.123.90","senderMAC":"18:66:da:9f:c6:6b","ts":"2022-05-04T12:04:24.263681965Z"}
{"caller":"arp.go:102","interface":"enp5s0f1","ip":"192.168.123.90","msg":"got ARP request for service IP, sending response","responseMAC":"f8:f2:1e:83:16:c1","senderIP":"192.168.123.90","senderMAC":"18:66:da:9f:c6:6b","ts":"2022-05-04T12:04:25.363984251Z"}
{"caller":"arp.go:102","interface":"enp5s0f0","ip":"192.168.123.90","msg":"got ARP request for service IP, sending response","responseMAC":"f8:f2:1e:83:16:c0","senderIP":"192.168.123.90","senderMAC":"18:66:da:9f:c6:6b","ts":"2022-05-04T12:04:25.364092268Z"}
=== pod/speaker-8dr2g ===
Error from server: Get "https://192.168.123.200:10250/containerLogs/metallb-system/speaker-8dr2g/speaker": dial tcp 192.168.123.200:10250: connect: connection refused
=== pod/speaker-h75vr ===
=== pod/speaker-kwbqf ===
=== pod/speaker-ll2wg ===
=== pod/speaker-zlnvn ===
~~~

You can see that different ARP requests for the same IP may come in on different interfaces, and the service will answer with different MAC addresses (I checked and indeed, those ARP requests to hit the interfaces that were listed here in my lab):
~~~
{"caller":"arp.go:102","interface":"enp5s0f1","ip":"192.168.123.90","msg":"got ARP request for service IP, sending response","responseMAC":"f8:f2:1e:83:16:c1","senderIP":"192.168.123.1","senderMAC":"52:54:00:00:00:b1","ts":"2022-05-04T12:10:09.202164934Z"}
{"caller":"arp.go:102","interface":"enp5s0f0","ip":"192.168.123.90","msg":"got ARP request for service IP, sending response","responseMAC":"f8:f2:1e:83:16:c0","senderIP":"192.168.123.1","senderMAC":"52:54:00:00:00:b1","ts":"2022-05-04T12:10:09.202147161Z"}
{"caller":"arp.go:102","interface":"br-ex","ip":"192.168.123.90","msg":"got ARP request for service IP, sending response","responseMAC":"18:66:da:9f:c6:6b","senderIP":"192.168.123.1","senderMAC":"52:54:00:00:00:b1","ts":"2022-05-04T12:10:09.202293315Z"}
~~~

Here's what's happening, as described in my earlier diagram (note that when I actually filtered for ARP, the bpf expression would not show me the arp request, so I omitted the filter altogether and just grep'ed instead):
~~~
sh-4.4# tcpdump -nnr /var/tmp/enp5s0f0.pcap  | grep 123.90
reading from file /var/tmp/enp5s0f0.pcap, link-type EN10MB (Ethernet)
dropped privs to tcpdump
12:40:30.953565 ARP, Request who-has 192.168.123.90 tell 192.168.123.1, length 46
12:40:30.953676 ARP, Reply 192.168.123.90 is-at f8:f2:1e:83:16:c0, length 46
sh-4.4# 
sh-4.4# tcpdump -nnr /var/tmp/enp5s0f1.pcap  | grep 123.90
reading from file /var/tmp/enp5s0f1.pcap, link-type EN10MB (Ethernet)
dropped privs to tcpdump
12:40:30.953565 ARP, Request who-has 192.168.123.90 tell 192.168.123.1, length 46
12:40:30.953678 ARP, Reply 192.168.123.90 is-at f8:f2:1e:83:16:c1, length 46
sh-4.4# tcpdump -nnr /var/tmp/br-ex.pcap  | grep 123.90
reading from file /var/tmp/br-ex.pcap, link-type EN10MB (Ethernet)
dropped privs to tcpdump
12:40:30.953756 ARP, Request who-has 192.168.123.90 tell 192.168.123.1, length 46
12:40:30.953811 ARP, Reply 192.168.123.90 is-at 18:66:da:9f:c6:6b, length 46
~~~

The code for this is here:
https://github.com/metallb/metallb/blob/992cd925818176c607e2e7a7dcf2d5712a8ddcf8/internal/layer2/arp.go#L102

Comment 15 Andreas Karis 2022-05-04 12:46:28 UTC
I do not have the logs for all of your metallb speakers so I can't verify this, but you should be able to do something like this:
~~~
for p in $(oc get pods -n metallb-system -o name) ; do echo === $p ===;  oc logs -n metallb-system $p | grep 4.4.4.1; done
~~~

And you should see the "got ARP request for service" and "sending response" messages there. And then, I'd recommend running tcpdump to check on the interfaces + check your switch config etc.

Comment 16 Andreas Karis 2022-05-06 19:19:46 UTC
Sorry for being so eager to close this out, but I have no indication so far that this is a bug, it looks as if this works as I described = as designed.

Comment 17 elevin 2022-05-13 20:04:46 UTC
The issue should be resolved by fixing this BZ:
https://bugzilla.redhat.com/show_bug.cgi?id=2068303

*** This bug has been marked as a duplicate of bug 2068303 ***