Bug 2188102
Summary: | active-backup bond configured with an 802.3ad bond as a slave has incorrect speed/duplex information | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 9 | Reporter: | Andrew Schorr <ajschorr> | ||||
Component: | kernel | Assignee: | Hangbin Liu <haliu> | ||||
kernel sub component: | Bonding | QA Contact: | LiLiang <liali> | ||||
Status: | CLOSED ERRATA | Docs Contact: | |||||
Severity: | unspecified | ||||||
Priority: | unspecified | CC: | bstinson, fbaudin, haliu, jwboyer, mhou, network-qe | ||||
Version: | CentOS Stream | Keywords: | Triaged | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | kernel-5.14.0-364.el9 | Doc Type: | If docs needed, set a value | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2024-04-30 10:09:49 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Andrew Schorr
2023-04-19 19:18:14 UTC
I did a simple test on rhel9, didn't reproduce this issue # create bonds ``` echo +bond0 > /sys/class/net/bonding_masters echo 4 > /sys/class/net/bond0/bonding/mode echo 100 > /sys/class/net/bond0/bonding/miimon ip link set bond0 up ifenslave bond0 ens1f0 ens1f1 #source /mnt/tests/kernel/networking/common/network.sh #get_iface_sw_port "ens1f0 ens1f1" sw p #swcfg setup_port_channel $sw "$p" active echo +bond1 > /sys/class/net/bonding_masters echo 1 > /sys/class/net/bond1/bonding/mode echo 100 > /sys/class/net/bond1/bonding/miimon ip link set bond1 up ifenslave bond1 bond0 ens4f0np0 ``` # ethtool bond1 Settings for bond1: Supported ports: [ ] Supported link modes: Not reported Supported pause frame use: No Supports auto-negotiation: No Supported FEC modes: Not reported Advertised link modes: Not reported Advertised pause frame use: No Advertised auto-negotiation: No Advertised FEC modes: Not reported Speed: 50000Mb/s Duplex: Full Auto-negotiation: off Port: Other PHYAD: 0 Transceiver: internal Link detected: yes [root@dell-per740-86 ~]# cat /proc/net/bonding/bond0 Ethernet Channel Bonding Driver: v5.14.0-296.2191_828573416.el9.x86_64 Bonding Mode: IEEE 802.3ad Dynamic link aggregation Transmit Hash Policy: layer2 (0) MII Status: up MII Polling Interval (ms): 100 Up Delay (ms): 0 Down Delay (ms): 0 Peer Notification Delay (ms): 0 802.3ad info LACP active: on LACP rate: slow Min links: 0 Aggregator selection policy (ad_select): stable System priority: 65535 System MAC address: b4:96:91:a5:9f:50 Active Aggregator Info: Aggregator ID: 2 Number of ports: 2 Actor Key: 21 Partner Key: 47 Partner Mac Address: b0:8b:d0:0a:73:3b Slave Interface: ens1f0 MII Status: up Speed: 25000 Mbps Duplex: full Link Failure Count: 1 Permanent HW addr: b4:96:91:a5:9f:50 Slave queue ID: 0 Aggregator ID: 2 Actor Churn State: none Partner Churn State: none Actor Churned Count: 0 Partner Churned Count: 1 details actor lacp pdu: system priority: 65535 system mac address: b4:96:91:a5:9f:50 port key: 21 port priority: 255 port number: 1 port state: 61 details partner lacp pdu: system priority: 32768 system mac address: b0:8b:d0:0a:73:3b oper key: 47 port priority: 32768 port number: 353 port state: 63 Slave Interface: ens1f1 MII Status: up Speed: 25000 Mbps Duplex: full Link Failure Count: 1 Permanent HW addr: b4:96:91:a5:9f:51 Slave queue ID: 0 Aggregator ID: 2 Actor Churn State: none Partner Churn State: none Actor Churned Count: 1 Partner Churned Count: 1 details actor lacp pdu: system priority: 65535 system mac address: b4:96:91:a5:9f:50 port key: 21 port priority: 255 port number: 2 port state: 61 details partner lacp pdu: system priority: 32768 system mac address: b0:8b:d0:0a:73:3b oper key: 47 port priority: 32768 port number: 357 port state: 63 [root@dell-per740-86 ~]# cat /proc/net/bonding/bond1 Ethernet Channel Bonding Driver: v5.14.0-296.2191_828573416.el9.x86_64 Bonding Mode: fault-tolerance (active-backup) Primary Slave: None Currently Active Slave: bond0 MII Status: up MII Polling Interval (ms): 100 Up Delay (ms): 0 Down Delay (ms): 0 Peer Notification Delay (ms): 0 Slave Interface: bond0 MII Status: up Speed: 50000 Mbps Duplex: full Link Failure Count: 2 Permanent HW addr: b4:96:91:a5:9f:50 Slave queue ID: 0 Slave Interface: ens4f0np0 MII Status: up Speed: 25000 Mbps Duplex: full Link Failure Count: 2 Permanent HW addr: 00:0f:53:7f:88:a0 Slave queue ID: 0 # uname -r 5.14.0-296.2191_828573416.el9.x86_64 when setup bond using NM, after the creation of bonding devices, the activeback mode bonding can't show correct speed info. but after re-up lacp bonding, the ab mode bonding speed can be displayed correctly. # setup ``` nmcli con add con-name mybond1 type bond ifname mybond1 bond.options "mode=1,miimon=100,updelay=5000,primary=mybond0" nmcli con add con-name mybond0 type bond ifname mybond0 bond.options "mode=802.3ad,miimon=100,updelay=5000" master mybond1 nmcli con add con-name ens1f0 type ethernet ifname ens1f0 master mybond0 nmcli con add con-name ens1f1 type ethernet ifname ens1f1 master mybond0 #$nmcli con up mybond0 nmcli con add con-name ens4f0np0 type ethernet ifname ens4f0np0 master mybond1 nmcli con up mybond1 ``` # after creation, the lacp bond can display speed info, but the ab bond can't display speed info [root@dell-per740-86 ~]# ethtool mybond0 Settings for mybond0: Supported ports: [ ] Supported link modes: Not reported Supported pause frame use: No Supports auto-negotiation: No Supported FEC modes: Not reported Advertised link modes: Not reported Advertised pause frame use: No Advertised auto-negotiation: No Advertised FEC modes: Not reported Speed: 50000Mb/s Duplex: Full Auto-negotiation: off Port: Other PHYAD: 0 Transceiver: internal Link detected: yes [root@dell-per740-86 ~]# ethtool mybond1 Settings for mybond1: Supported ports: [ ] Supported link modes: Not reported Supported pause frame use: No Supports auto-negotiation: No Supported FEC modes: Not reported Advertised link modes: Not reported Advertised pause frame use: No Advertised auto-negotiation: No Advertised FEC modes: Not reported Speed: Unknown! Duplex: Unknown! (255) Auto-negotiation: off Port: Other PHYAD: 0 Transceiver: internal Link detected: yes [root@dell-per740-86 ~]# cat /proc/net/bonding/mybond0 Ethernet Channel Bonding Driver: v5.14.0-296.2191_828573416.el9.x86_64 Bonding Mode: IEEE 802.3ad Dynamic link aggregation Transmit Hash Policy: layer2 (0) MII Status: up MII Polling Interval (ms): 100 Up Delay (ms): 5000 Down Delay (ms): 0 Peer Notification Delay (ms): 0 802.3ad info LACP active: on LACP rate: slow Min links: 0 Aggregator selection policy (ad_select): stable System priority: 65535 System MAC address: 00:0f:53:7f:88:a0 Active Aggregator Info: Aggregator ID: 1 Number of ports: 2 Actor Key: 21 Partner Key: 47 Partner Mac Address: b0:8b:d0:0a:73:3b Slave Interface: ens1f0 MII Status: up Speed: 25000 Mbps Duplex: full Link Failure Count: 0 Permanent HW addr: b4:96:91:a5:9f:50 Slave queue ID: 0 Aggregator ID: 1 Actor Churn State: monitoring Partner Churn State: monitoring Actor Churned Count: 0 Partner Churned Count: 0 details actor lacp pdu: system priority: 65535 system mac address: 00:0f:53:7f:88:a0 port key: 21 port priority: 255 port number: 1 port state: 61 details partner lacp pdu: system priority: 32768 system mac address: b0:8b:d0:0a:73:3b oper key: 47 port priority: 32768 port number: 353 port state: 63 Slave Interface: ens1f1 MII Status: up Speed: 25000 Mbps Duplex: full Link Failure Count: 0 Permanent HW addr: b4:96:91:a5:9f:51 Slave queue ID: 0 Aggregator ID: 1 Actor Churn State: monitoring Partner Churn State: monitoring Actor Churned Count: 0 Partner Churned Count: 0 details actor lacp pdu: system priority: 65535 system mac address: 00:0f:53:7f:88:a0 port key: 21 port priority: 255 port number: 2 port state: 61 details partner lacp pdu: system priority: 32768 system mac address: b0:8b:d0:0a:73:3b oper key: 47 port priority: 32768 port number: 357 port state: 63 [root@dell-per740-86 ~]# cat /proc/net/bonding/mybond1 Ethernet Channel Bonding Driver: v5.14.0-296.2191_828573416.el9.x86_64 Bonding Mode: fault-tolerance (active-backup) Primary Slave: mybond0 (primary_reselect always) Currently Active Slave: mybond0 MII Status: up MII Polling Interval (ms): 100 Up Delay (ms): 5000 Down Delay (ms): 0 Peer Notification Delay (ms): 0 Slave Interface: ens4f0np0 MII Status: up Speed: 25000 Mbps Duplex: full Link Failure Count: 0 Permanent HW addr: 00:0f:53:7f:88:a0 Slave queue ID: 0 Slave Interface: mybond0 MII Status: up Speed: Unknown Duplex: Unknown Link Failure Count: 0 Permanent HW addr: ea:7e:04:ad:e4:e5 Slave queue ID: 0 But after re-up lacp bonding, the ab bonding can show correct link speed info # re-up lacp bond [root@dell-per740-86 ~]# ip link set mybond0 down [root@dell-per740-86 ~]# ip link set mybond0 up # let lacp bond become active slave [root@dell-per740-86 ~]# ip link set ens4f0np0 down [root@dell-per740-86 ~]# ip link set ens4f0np0 up # check ab bond speed info [root@dell-per740-86 ~]# ethtool mybond1 Settings for mybond1: Supported ports: [ ] Supported link modes: Not reported Supported pause frame use: No Supports auto-negotiation: No Supported FEC modes: Not reported Advertised link modes: Not reported Advertised pause frame use: No Advertised auto-negotiation: No Advertised FEC modes: Not reported Speed: 50000Mb/s Duplex: Full Auto-negotiation: off Port: Other PHYAD: 0 Transceiver: internal Link detected: yes Yes, I duplicated your result. The initial bond1 speed/duplex after booting is Unknown!/Unknown!, but setting bond0 down and then back up fixes the problem. But that shouldn't be necessary. This seems like a bug. Is there a reason that this has not been accepted as a bug? Surely this behavior is not OK. After rebooting, bond1 comes up with speed and duplex unknown: sh-5.1$ head /sys/class/net/bond1/{speed,duplex} ==> /sys/class/net/bond1/speed <== -1 ==> /sys/class/net/bond1/duplex <== unknown sh-5.1$ sudo ethtool bond1 | egrep -i 'speed|duplex' Speed: Unknown! Duplex: Unknown! (255) Regards, Andy Hangbin, Is this a bug? Regards, Liang Li (In reply to Andrew Schorr from comment #3) > Yes, I duplicated your result. The initial bond1 speed/duplex after booting > is Unknown!/Unknown!, > but setting bond0 down and then back up fixes the problem. But that > shouldn't be necessary. > This seems like a bug. This is because you add bond1 on top of bond0 before bond0 exchanges the 802.3ad/lacp link info with peer. So bond1 only get 10g link info from bond0. After that the bond0 sync link info and update to 20g. If you set bond0 up and exchange/sync the link info. Then add bond1 on top of bond0, you will get correct speed. Here is a reporducer: ``` #!/bin/bash s_ns="s" c_ns="c" ip -a netns del sleep 1 ip netns add ${c_ns} ip netns add ${s_ns} ip -n ${c_ns} link add bond0 type bond mode 802.3ad miimon 100 ip -n ${s_ns} link add bond0 type bond mode 802.3ad miimon 100 ip -n ${s_ns} link add bond1 type bond mode active-backup miimon 100 for i in $(seq 0 2); do ip -n ${c_ns} link add eth${i} type veth peer name eth${i} netns ${s_ns} [ $i -eq 2 ] && break ip -n ${c_ns} link set eth${i} master bond0 ip -n ${s_ns} link set eth${i} master bond0 done ip -n ${c_ns} link set eth2 up ip -n ${c_ns} link set bond0 up # set bond0 up to sync the info # ip -n ${s_ns} link set bond0 up # sleep 5 # ip -n ${s_ns} link set bond0 down ip -n ${s_ns} link set bond0 master bond1 ip -n ${s_ns} link set eth2 master bond1 ip -n ${s_ns} link set bond1 up ip netns exec ${c_ns} ethtool bond0 | grep Speed ip netns exec ${s_ns} ethtool bond0 | grep Speed ip netns exec ${s_ns} ethtool bond1 | grep Speed ``` When run the reproducer directly, you will see: # ./bond_topo_lacp.sh Speed: 10000Mb/s Speed: 10000Mb/s Speed: 10000Mb/s If you uncomment the following part ``` # ip -n ${s_ns} link set bond0 up # sleep 5 # ip -n ${s_ns} link set bond0 down ``` and re-run the test, you will get: # ./bond_topo_lacp.sh Speed: 20000Mb/s Speed: 20000Mb/s Speed: 20000Mb/s So I think this is a configuration issue, not a bug. What do you think? Thanks Hangbin Hi, Thanks for working on this. I agree 100% that there are race conditions involved. However, I still think that the current behavior is buggy. The fact is that an 802.3ad bond has a dynamic speed setting. If a slave goes down or up, then the speed of the 802.3ad bond will decrease or increase. The active-backup bond in bond1 that is the parent of bond0 must therefore also have a dynamic speed value. When the speed of its active slave interface changes, it must also change its speed. With the sleep uncommented, can you please at the end of the script use ip link set down to disable one of the slave links of the 802.3ad bond? After you do that, does the speed of bond1 update properly? If not, then there's a bug. I'm 99% certain that you will see a bug. The active-backup bond needs to understand that the speed of its slaves may be dynamic. There need to be hooks in the kernel to trigger the bond master to reevaluate its speed when a slave goes up or down. Regards, Andy Created attachment 1959829 [details]
script to demonstrate that the active-backup bond speed is not updating when the speed of the underlying link changes
This script demonstrates that the the top-level active-backup link does
not dynamically update when the speed/duplex of the underlying link changes.
This looks like a kernel bug to me. There need to be kernel hooks so that
the master can react when the speed/duplex of a slave changes.
To be clear, when I run that script, I see this: sh-5.2# ./bondbug.sh Speed: 20000Mb/s Speed: 20000Mb/s Speed: 20000Mb/s Now disabling eth0 in namespace c Speed: 10000Mb/s Speed: 10000Mb/s Speed: 20000Mb/s That's on Fedora 37 with kernel 6.2.12-200. I'm puzzled by what's going on here. It seems blindingly obvious to me that there's a kernel bug here. How is that not obvious to you? Let's move forward. It could well be an upstream issue. How do we make progress on this? Regards, Andy (In reply to Andrew Schorr from comment #8) > > This script demonstrates that the the top-level active-backup link does > not dynamically update when the speed/duplex of the underlying link changes. > This looks like a kernel bug to me. There need to be kernel hooks so that > the master can react when the speed/duplex of a slave changes. The bonding interface only dynamically updates the speed/duplex for 8023AD mode. For other modes, It will only update the speed/duplex unless there is a failover or slave change. The downlink bond0's speed change doesn't trigger a failover, so the upper link bond1's speed won't change. If you still think this is a kernel bug or design issue, I'd suggest you ask this upstream. Thanks Hangbin I understand the current behavior: it does not dynamically update the speed/duplex of the active-backup bond when the speed/duplex of the active slave changes. I don't THINK that it's a bug; I KNOW that it's a bug. Unless the kernel folks say that it's invalid to have a bond on top of another bond and that they don't support this configuration. But that would be silly. I'm willing to report it upstream, but they don't want me to. When I go to https://bugzilla.kernel.org/ it says: "This bugzilla is for reporting bugs against upstream Linux kernels. If you did not compile your own kernel from scratch, you are probably in the wrong place. Please use the following links to report a bug to your distribution instead: Ubuntu | Fedora | Arch | Mint | Debian | Red Hat | OpenSUSE | SUSE" I am not compiling my own kernel. I'm using the CentOS Stream 9 kernel. So you guys really need to file the bug against upstream. Why are you so unwilling to acknowledge that this is a bug? It seems pretty obvious. Regards, Andy (In reply to Andrew Schorr from comment #12) > Why are you so unwilling to acknowledge that this is a bug? It seems pretty > obvious. OK, let me discuss this upstream and see how to fix it. Hangbin Hi Andrew, Here is the discussion[1] I made with bonding maintainer. He doesn't suggest nesting LACP bond inside an active-backup bond. Could you use LACP bond directly? [1] https://lore.kernel.org/netdev/ZFm7Hwz6cqEkVB1g@Laptop-X1/T/#mf8433b43239f5cb843fdc974565bb41b5b94ce5f Thanks Hangbin Hi Hangin, Thank you for pursuing this upstream. I selected an active-backup bond on top of 802.3ad for 2 reasons: 1. I am using arp_ip_target to ensure that the network interface is truly connected to the other network devices that it needs to talk to. Perhaps I am mistaken, but I believe that 802.3ad bonds use only MII monitoring to determine whether the link is up. 2. I have 2 10-gig links in a port-channel group on an Arista switch as the primary link, and the backup is a single, "normal", 1-gig connection to a Cisco switch. I would also like to be able to use PXE booting over that normal 1-gig link, so I did not want to set it up in an LACP port-channel. While I was aware that a Linux 802.3ad bond could contain multiple aggregators, I was not aware of this concept of having a non-LACP normal NIC as an aggregator inside the 802.3ad bond. If that truly works, then it would address my concern #2, but it doesn't change issue #1 regarding ARP monitoring. Where is this idea of having a non-LACP NIC documented? I don't see any mention in the Bonding Driver HOWTO: https://www.kernel.org/doc/Documentation/networking/bonding.txt If I want to use that config, how do I enable it? Looking at the source code in drivers/net/bonding/bond_3ad.c, I see this code snippet: if (port->actor_oper_port_key & AD_DUPLEX_KEY_MASKS) /* if port is full duplex */ port->aggregator->is_individual = false; else port->aggregator->is_individual = true; This seems to suggest that I need to disable duplex in the ad_user_port_key to select this "individual" mode, but I'm confused about that, because "ad_user_port_key" seems to be a property of the bonding master, not the slaves. So how do I actually use this "individual" non-LACP normal NIC in a 802.3ad bonding group? Does it somehow magically work automatically with no configuration? Normally, I would expect the Linux system to attempt LACP with the switch and then refuse to bring up the link if the switch does not respond with LACP packets. In any case, I still prefer to have an active-backup bond on top of 802.3ad because that allows me to use arp monitoring. And look -- it's not the end of the world if the speed & duplex info for the active-backup bond are incorrect. However, it would be preferable if they could just be set to -1 or N/A instead of sometimes having bogus values. It seems like it should be possible to detect that one or more of the slaves is a bond and then just set speed and duplex to N/A. Please do not disable bonds of bonds in the kernel. I find it to be a useful feature. But also -- please improve the 802.3ad bonding documentation to explain more clearly that one can use multiple aggregators (it took me a while to figure this out when I was planning my configuration), and also to explain this concept of having an individual "normal" non-LACP aggregator inside the bond and how that works. Thanks, Andy Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: kernel security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2024:2394 |