Bug 2216056

Summary: team sends arp announcement with random mac after first port added
Product: Red Hat Enterprise Linux 7 Reporter: Curtis Taylor <cutaylor>
Component: NetworkManagerAssignee: NetworkManager Development Team <nm-team>
Status: CLOSED CURRENTRELEASE QA Contact: Desktop QE <desktop-qa-list>
Severity: high Docs Contact:
Priority: medium    
Version: 7.9CC: arawal, bgalvani, lrintel, mgokhool, nm-team, prpatel, rkhan, sfaye, sukulkar, till
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-07-28 07:00:01 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Curtis Taylor 2023-06-19 23:39:42 UTC
Description of problem:
team's gratuitous arp has random arp.src.hw_mac after first port is added

Version-Release number of selected component (if applicable):
RHEL7.9 with NetworkManager-1.18.8-2.el7_9.x86_64
(Also in RHEL8 until NetworkManager-1.34.0-0.3.el8)

How reproducible:
Easily in RHEL7.9 VM:

Steps to Reproduce:
1. RHEL7.9 VM
2. 
   nmcli con add type team ifname team1 con-name team1 team.runner activebackup ipv4.method manual ipv4.addresses $IP ipv6.method ignore
   nmcli con add type team-slave ifname ens7s0 con-name ens7s0 master team1

3. Watch $PORT externally (e.g.) 
   tshark -i virbr0 -Y "arp.src.hw_mac != eth.src"

Actual results:
 duplicate use of $PORT

Expected results:
No duplicate use of ip address detected.
All gratuitous arps should have arp.src.hw_mac == eth.src.

Additional info:
Customer has the issue during boot. We mitigated it mostly, but not entirely by removing NM and PORT from initramfs, but issue still occurred very infrequently. We found no NM connection delays that were helpful.

At step #2 of reproduction steps:
I used this script in case it takes a few attempts to reproduce.  I found that sleep after adding the team's port at the end of the script is key to witnessing the problem.  Too short of sleep causes the team to be recycled before the arp announce with the random mac emits.

#!/bin/bash
TEAM=team1
PORT=ens7s0
IP="192.168.11.11/24"
let i=1
while [ 1 ] ; do
	date
	echo "Test # $i"
	nmcli con down $TEAM; nmcli con down $PORT; nmcli con del $TEAM; nmcli con del $PORT;
        sleep 1;
  	nmcli con add type team ifname $TEAM con-name $TEAM team.runner activebackup ipv4.method manual ipv4.addresses $IP ipv6.method ignore
        sleep 1
	ip a show
        nmcli con add type team-slave ifname $PORT con-name $PORT master $TEAM
	sleep 3
        let i+=1
done

We tested in RHEL8 using NM builds and found
* Fixed in RHEL8 starting with NetworkManager-1.36.0-0.1.el8 .
* Issue footprint is exactly like https://bugzilla.redhat.com/show_bug.cgi?id=1678796 from RHEL7.6.

Changelog between 1.34.0-0.3 and 1.36.0-0.1 is only this:

* Thu Nov 18 2021 Beniamino Galvani <bgalvani> - 1:1.36.0-0.1
- Upgrade to 1.35.1 release (development)
- core: refactor IP configuration code (rh #1868254)
- core: fix deleting external route during service restart (rh #2010640)

* Thu Oct 21 2021 Ana Cabral <acabral> - 1:1.34.0-0.3
- Upgrade to 1.33.4 release (development)
- Deprecate "master"/"slave" on bonding and bridge API (rh #1949023)
- core: Fix configuration reload for active devices (rh #1852445)
- Update systemd-udev dependency (rh #2012123)

$ git log --oneline --grep=l3 1.33.4-dev..1.35.1-dev 
...
58287cbcc0 core: rework IP configuration in NetworkManager using layer 3 configuration   <----- 44k lines including announce mentioned 68 times.

I am opening a BZ hoping thaller and/or bgalvani weigh in on whether or not there is a fix or workaround for this that can be added to RHEL7.9 NM.

Comment 6 Beniamino Galvani 2023-06-23 14:40:03 UTC
Hi,

the issue seems to be that NM starts sending the ARP announcements when the team interface still has a random MAC. Shortly after, a port gets attached to the team, and NM sends the remaining ARPs with the wrong (random) MAC.

What should happen instead is that NM waits the team to have a port attached and that the team's MAC address is inherited from one port; then it can send ARP announcements.

This problem already appeared in the past, with the following timeline:

 - it was initially reported as [1] in RHEL 7.6 and fixed in RHEL 7.7 (NetworkManager 1.16);

 - then we found that the fix was incomplete [2] and a new fix was developed in RHEL 8.5 (NetworkManager 1.32.10);

 - later, in RHEL 8.6 (NetworkManager 1.36) there was a rework of the handling of IP configuration in NetworkManager.

In the case linked to this bz I see that the issue is reproducible with 1.34 and not with 1.36; this suggests that the IP configuration rework might have fixed the problem. Unfortunately, that rework can't be backported to RHEL7 because it is too big. On the other hand, the fix present in [2] should be easy to backport; therefore it would be useful to understand why it is not working as expected.

Would it be possible to have a 'journalctl -b' log with NetworkManager set at TRACE level, for both the following NM versions?

 - 1.32 or 1.34 (where the issue is reproduced)
 - 1.36 (where the issue is NOT reproduced)

Do you have a setup to reproduce the issue? If so, can you share the script or the environment?

Thank you.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1678796
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1956793

Comment 7 Prijesh 2023-06-23 15:47:54 UTC
Hello @bgalvani ,

Thanks for checking this.

The issue is easily reproducible with below steps:

1. Start the below script on the system:

#!/bin/bash
TEAM=team1
PORT=ens7s0
IP="192.168.11.11/24"
let i=1
while [ 1 ] ; do
	date
	echo "Test # $i"
	nmcli con down $TEAM; nmcli con down $PORT; nmcli con del $TEAM; nmcli con del $PORT;
        sleep 1;
  	nmcli con add type team ifname $TEAM con-name $TEAM team.runner activebackup ipv4.method manual ipv4.addresses $IP ipv6.method ignore
        sleep 1
	ip a show
        nmcli con add type team-slave ifname $PORT con-name $PORT master $TEAM
	sleep 3
        let i+=1
done


2. Watch $PORT externally (e.g.) (we used host bridge virbr0)
   tshark -i virbr0 -Y "arp.src.hw_mac != eth.src"


Are you okay with the above reproduction steps? If you still want us to generate TRACE logs from our VM, do let us know.


Thanks,
Prijesh

Comment 8 Beniamino Galvani 2023-06-24 07:55:28 UTC
Thanks, I can reproduce the bug with your script. The fix in 1.34 [1] works well with bonding but has issues with teaming. When a port is attached to the team interface, carrier goes up on the team while the interface still has the random MAC. After some time, the userspace teamd process selects the active port and then the MAC gets updated. Example log:

          ### no carrier, random MAC
  <trace> [1687552518.0684] platform-linux: event-notification: RTM_NEWLINK, flags 0, seq 0: 23: team1 <UP;broadcast,multicast,up> mtu 1500 arp 1 team* not-init addrgenmode none addr 2E:8A:C5:4A:AC:F2 brd FF:FF:FF:FF:FF:FF rx:0,0 tx:0,0
  <debug> [1687552519.1281] platform: (enp7s0) link: enslaving to master 'team1'
          ### carrier goes up
  <debug> [1687552519.1283] platform: (team1) signal: link changed: 23: team1 <UP,LOWER_UP;broadcast,multicast,up,lowerup> mtu 1500 arp 1 team* init addrgenmode eui64 addr 2E:8A:C5:4A:AC:F2 brd FF:FF:FF:FF:FF:FF driver team rx:0,0 tx:0,0
  <info>  [1687552519.1289] device (team1): carrier: link connected
          ### start sending ARP with wrong MAC
  <debug> [1687552519.1291] acd[0x7fed74007ca0,23]: announcing address 192.168.11.11 (hw-addr 2E:8A:C5:4A:AC:F2)
  teamd_team1[9427]: Found best port: "enp7s0" (ifindex "3", prio "0").
          ### now the MAC is updated
  <trace> [1687552519.1311] platform-linux: event-notification: RTM_NEWLINK, flags 0, seq 0: 23: team1 <UP,LOWER_UP;broadcast,multicast,up,lowerup> mtu 1500 arp 1 team* not-init addrgenmode eui64 addr 52:54:00:2D:F5:1D brd FF:FF:FF:FF:FF:FF rx:0,0 tx:0,0
  
1.36 fixes that problem by restarting ARP announcements when the MAC of the team changes. However, the fix in 1.36 can't be backported because it is part of a big rework. A workaround to the problem would be to set a static MAC on the team interface, as in:

  nmcli connection modify $TEAM ethernet.cloned-mac-address $(cat /sys/class/net/$PORT/address)

In this way, the team interface is created from the beginning with the right MAC. Would it be an acceptable solution?

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1956793

Comment 9 Prijesh 2023-06-26 11:04:18 UTC
Hello @bgalvani 

Thanks for digging this further.

I will be sharing the information with the customer and will inform here as soon as we get feedback from them.

I have tried these steps in my VM and it looks promising. 

Just to check with you, I have given them the below workaround as set "team.notify-peers-count 10 team.notify-peers-interval 1000", do you think that is a good option:

+++
I have tested in my local system with the above, and observed(tested with 70+ attempts): 

1st g-ARP sent with correct MAC
2nd g-ARP was sent with random MAC
remaining g-ARP was sent with the correct MAC
 
As the last many packets are sent with the correct MAC, it will override the previous random MAC and can help

+++

Just wanted to ask the above, so that we can add a possible workaround in KCS that we have created for this issue.


Thanks,
Prijesh

Comment 10 Beniamino Galvani 2023-06-26 13:02:46 UTC
Yes, it seems notify-peers could help to make sure the right MAC is announced after the random one. Another alternative would be to delay the ARP announcements by enabling DAD (duplicate address detection) for IPv4. To do that, set for example `ipv4.dad-timeout 3000`. As said, another approach would be to set a fixed MAC on the team.

Comment 11 Prijesh 2023-07-10 05:49:11 UTC
Till Maas,

I have asked the customer about the workarounds but have not got any response yet.

Once I get a response, I will update here.


Thanks,
Prijesh

Comment 12 Prijesh 2023-07-28 04:36:41 UTC
Hello,

We got a response as their end customer is not back after our workarounds were provided, so I feel there is no need to keep this BZ open, we can close it.

Thanks,
Prijesh

Comment 13 Beniamino Galvani 2023-07-28 07:00:01 UTC
Thanks, since the issue is already fixed in RHEL 8.6 and later, I'm closing the bz.