Bug 2165802
| Summary: | bonding active-backup priority with arping is not working | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 9 | Reporter: | tbsky <tbskyd> | ||||
| Component: | kernel | Assignee: | Jonathan Toppins <jtoppins> | ||||
| kernel sub component: | Bonding | QA Contact: | LiLiang <liali> | ||||
| Status: | CLOSED NOTABUG | Docs Contact: | |||||
| Severity: | medium | ||||||
| Priority: | unspecified | CC: | jbainbri, jtoppins, network-qe | ||||
| Version: | 9.1 | ||||||
| Target Milestone: | rc | ||||||
| Target Release: | --- | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2023-02-23 23:35:37 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
I think this is because the non-primary slave will receive arp reply first sometimes, then it will become active slave.
If the primary slave receive arp reply first, it will be selected as the active slave.
As you are using two directly connected NICs, this issue have chance to happen in you environment.
I don't know why this issue doesn't happen with team...
This is just my guess..
+------+
10G NIC------+ |
| |
| SW +------arp_ip_target
| |
1G NIC-------+ |
+------+
If you use the environment like this, can you reproduce this problem?
Hi: I don't have such environment to test. but I create two qemu VM, each have two nics and connect to the same bridge. the active-backup arping priority works correctly. the two nodes need to connect directly so it won't affect by failed switch. previously teamd works fine. but I guess I need to use miimon now since bonding arping is working fine across switch/bridge. so it seems a feature not a bug? (In reply to tbsky from comment #2) > Hi: > > I don't have such environment to test. but I create two qemu VM, each > have two nics and connect to the same bridge. the active-backup arping > priority works correctly. > > the two nodes need to connect directly so it won't affect by failed > switch. previously teamd works fine. but I guess I need to use miimon now > since bonding arping is working fine across switch/bridge. so it seems a > feature not a bug? I can't confirm this. Let developer have a look too. Hello, I have not gotten a chance to test a Fedora kernel or something really close to upstream. What I can say for now is bonding requires the use of either miimon or arpmon to manage the bond link state and in this case the fail-over process for active-backup mode. If neither of these monitors are used there are several cases where the bond interface would get stuck down when a member link changes state. In older RHELs (RHEL-7 for example) there was no default monitor selected now in upstream (I will have to verify when in RHEL-8 & -9) miimon is selected by default if no monitor selection is made during the creation of the bond. In Bonding there is no way to use an external process like arping to monitor the link state of a member port, the only monitoring options are bond member link state(miimon) or arpmon. I will note, assuming I am understanding the description correctly, incoming L2 management traffic in this case an ARP Request causing the active bond member to switch sounds like a bug. It should not be possible for an external entity to cause a state change in bonding unless that entity was controlling the link state of a bond member. This is what I need to test upstream and clarify understanding about. Hope this helps for now. -Jon hi:
for testing the direct connection situation, I create two bridges for the two VM. so it connects likeļ¼
+-------+
| |
eth0 ---+ br0 +--- eth0
| |
+-------+
+-------+
| |
eth1 ---+ br1 +--- eth1
| |
+-------+
the behavior is like physical direct connection nodes. fault-tolerance still works, but the priority selection/recovery is not. and arping -I xxxx ethx will help the bonding to find the correct priority link.
tbsky,
To clarify, it appears in the original description you are saying arp monitor is not working for you?
May I assume your logical network looks something like?
Node A Node B
10.255.99.1/24 10.255.99.2/24
/---------------\ /---------------\
| | /-------\ | |
| /----| L1 |SWITCH1| L4 |----\ |
| | +------+ +------+ | |
| | B | | | | B | |
| | O | \-------/ | O | |
| | N | | N | |
| | D | /-------\ | D | |
| | 0 | | | | 0 | |
| | +------+ +------+ | |
| \----| L2 |SWITCH2| L3 |----/ |
| | \-------/ | |
\---------------/ \---------------/
Providing an `ip -d link show` for each VM would help clarify how things are connected.
Hi:
yes my current configuration at the vm is just like what you draw.
below are "ip -d link show" result at two nodes:
nodeA(10.255.99.1)
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 promiscuity 0 minmtu 0 maxmtu 0 addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
2: eth0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc fq_codel master bond99 state UP mode DEFAULT group default qlen 1000
link/ether 52:54:00:59:78:21 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
bond_slave state BACKUP mii_status DOWN link_failure_count 1 perm_hwaddr 52:54:00:59:78:21 queue_id 0 addrgenmode none numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65
535 parentbus virtio parentdev virtio0
altname enp1s0
3: eth1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc fq_codel master bond99 state UP mode DEFAULT group default qlen 1000
link/ether 52:54:00:59:78:21 brd ff:ff:ff:ff:ff:ff permaddr 52:54:00:81:7d:11 promiscuity 0 minmtu 68 maxmtu 65535
bond_slave state ACTIVE mii_status UP link_failure_count 0 perm_hwaddr 52:54:00:81:7d:11 queue_id 0 addrgenmode none numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 6553
5 parentbus virtio parentdev virtio5
altname enp7s0
4: bond99: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
link/ether 52:54:00:59:78:21 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
bond mode active-backup active_slave eth1 miimon 0 updelay 0 downdelay 0 peer_notify_delay 0 use_carrier 1 arp_interval 200 arp_missed_max 2 arp_ip_target 10.255.99.2 arp_validate n
one arp_all_targets any primary eth1 primary_reselect always fail_over_mac none xmit_hash_policy layer2 resend_igmp 1 num_grat_arp 1 all_slaves_active 0 min_links 0 lp_interval 1 packet
s_per_slave 1 lacp_active on lacp_rate slow ad_select stable tlb_dynamic_lb 1 addrgenmode none numtxqueues 16 numrxqueues 16 gso_max_size 65536 gso_max_segs 65535
nodeB(10.255.99.2)
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 promiscuity 0 minmtu 0 maxmtu 0 addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
2: eth0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc fq_codel master bond99 state UP mode DEFAULT group default qlen 1000
link/ether 52:54:00:42:e0:b7 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
bond_slave state BACKUP mii_status GOING_DOWN link_failure_count 1 perm_hwaddr 52:54:00:42:e0:b7 queue_id 0 addrgenmode none numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_s
egs 65535 parentbus virtio parentdev virtio0
altname enp1s0
3: eth1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc fq_codel master bond99 state UP mode DEFAULT group default qlen 1000
link/ether 52:54:00:42:e0:b7 brd ff:ff:ff:ff:ff:ff permaddr 52:54:00:23:6c:9d promiscuity 0 minmtu 68 maxmtu 65535
bond_slave state ACTIVE mii_status UP link_failure_count 1 perm_hwaddr 52:54:00:23:6c:9d queue_id 0 addrgenmode none numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 6553
5 parentbus virtio parentdev virtio8
altname enp10s0
4: bond99: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
link/ether 52:54:00:42:e0:b7 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
bond mode active-backup active_slave eth1 miimon 0 updelay 0 downdelay 0 peer_notify_delay 0 use_carrier 1 arp_interval 200 arp_missed_max 2 arp_ip_target 10.255.99.1 arp_validate n
one arp_all_targets any primary eth1 primary_reselect always fail_over_mac none xmit_hash_policy layer2 resend_igmp 1 num_grat_arp 1 all_slaves_active 0 min_links 0 lp_interval 1 packet
s_per_slave 1 lacp_active on lacp_rate slow ad_select stable tlb_dynamic_lb 1 addrgenmode none numtxqueues 16 numrxqueues 16 gso_max_size 65536 gso_max_segs 65535
Created attachment 1945773 [details]
network recreate
This recreate is not able to reproduce the specific issue the reporter is having.
tbsky, The interface names dumped in comment #7 do not match the interface names used in the original description, so I am going to assume `eth1` is equivalent to `enp1s0`. I would recommend the following changes to the bond configuration. 1. set `active_slave` to `eth1` This will set eth1 as the active slave regardless of the order network manager adds slaves. By default bonding attempts to use the first slave added to the bond as the active slave. 2. set `arp_validate` to `active` Given your network configuration and wanting to ping past the first switch. From the kernel bonding documentation: Enabling validation causes the ARP monitor to examine the incoming ARP requests and replies, and only consider a slave to be up if it is receiving the appropriate ARP traffic. For an active slave, the validation checks ARP replies to confirm that they were generated by an arp_ip_target. Since backup slaves do not typically receive these replies, the validation performed for backup slaves is on the broadcast ARP request sent out via the active slave. It is possible that some switch or network configurations may result in situations wherein the backup slaves do not receive the ARP requests; in such a situation, validation of backup slaves must be disabled. The validation of ARP requests on backup slaves is mainly helping bonding to decide which slaves are more likely to work in case of the active slave failure, it doesn't really guarantee that the backup slave will work if it's selected as the next active slave. Validation is useful in network configurations in which multiple bonding hosts are concurrently issuing ARPs to one or more targets beyond a common switch. Should the link between the switch and target fail (but not the switch itself), the probe traffic generated by the multiple bonding instances will fool the standard ARP monitor into considering the links as still up. Use of validation can resolve this, as the ARP monitor will only consider ARP requests and replies associated with its own instance of bonding. 3. You can try setting `primary_reselect` to `better` Given the bandwidth between the two slaves is different I am assuming you want to prefer the 10G link over the 1G link. This will only force slave reselection when a better slave is available. Hi: the original description was on production physical machines, which can not often put to the testing. so I create two virtual machines which can reproduce the same behavior. the VMs are in comment #2 then comment #5. the testing command/arch are now based on VMs at comment #5. I tried "arp_validate=active" but the situation is the same. I can not test "primary_reselect=better" because there is no nic speed at qemu vm. and since I already know I always want eth1 as long as it is alive so I think maybe I don't need it to judge which one is better. "active_slave=eth1" will change to the real active nic when bonding status change. so I don't know why I need it. maybe it will help when initializing bonding. but under vm the primary nic selection is always correct after reboot. the problem is when I force the primary nic down and up again (by down and up the virtual nic link at virt-manager), it won't be selected again. unless I issue command: arping -I eth1 10.255.99.2 as you said, this is the strange part. BTW: I was hoping I can use miimon instead of arpmon. but after heavy testing (reboot two physical machines 200 times by script). I found some nic drivers sometimes may miss the real nic link status at booting (nic up/down frequently at the moment). when one machine think the nic link is down, but peer think the nic link is up, then the whole active-backup bonding is broken and unusable. under my testing arpmon won't let it happen, it will always make the bonding usable, although not the best link sometimes. so I will keep using arpmon and hope someday bonding will become usable as teamd. The virtio-net driver is not great for testing bonding because it doesn't present as a complete Ethernet device as it doesn't present a speed by default. LACP, active-backup, etc will not completely function properly in all cases. I have posted a network recreate script attempting to simulate the setup. The difference is the script is using net namespaces and veths. When link L1 is brought down, `ip netns exec switch0 ip link set dev eth0 down`, one will observe both bonds failover to the backup link. And when one brings back up L1, `ip netns exec switch0 ip link set dev eth0 up`, one will observe the bonds prefer the primary(eth0) and eth0 becomes the active link. Miimon will not work for this particular network setup because the network does not have multiple paths to the hosts, Host A can only use path one or path two. Therefore, an active protocol such as arp monitor must be used otherwise one side (Host A) may detect a local link problem and fail over to eth1 but Host B will have no notification of this (all its links are fine) and so Host B will continue to use eth0 as its active port. One can observe this by modifying the bond creation to use miimon in the recreate script and then failing L1 as described above. Notice host A(ns1) fails over to eth1 but host B(ns2) is still using eth0. Neither host can talk to the other host because the active port on both hosts have no path to the other host. To allow for the use of miimon one would need to create an inter-switch link(ISL) between the two switches. I would also recommend adding the bonding option `num_grat_arp 5` so the switches between the two hosts correctly learn the new path when the bond for either host fails over. Hello tbsky, My name is Jamie from Red Hat Asia Pacific. I am a senior support resource assisting with this issue. Please note that Bugzilla is not a support channel. It is for reporting and tracking quantified bugs in Red Hat source code. If you'd like Red Hat's assistance troubleshooting an issue, we're very happy to provide that service via our technical support entitlements. I could not find your email address attached to a supported account (or any Customer Portal account). If you have a paid support entitlement, please do open a support case in future. If you do not wish to purchase support, the free RHEL entitlement still provides you access to our knowledgebase, product documentation, and to our Discussions community where you can talk to other Red Hat software users and some employees. You can search on our Customer Portal at: https://access.redhat.com/ You can open a community Discussions thread at: https://access.redhat.com/discussions/ For your reference: Can I get technical support using Bugzilla if I do not have a Red Hat support entitlement? https://access.redhat.com/articles/1756 For this specific issue, you seem to be directly connecting your systems without a switch in between. As you said: (In reply to tbsky from comment #0) > the two nodes are connect directly with a 10G and a 1G nic. (In reply to tbsky from comment #2) > the two nodes need to connect directly so it won't affect by failed switch Please note this is not a supported usage of bonding. We require a network switch in between systems. One possible result of directly-connected or crossover bonding is incorrect failover state on each system resulting in no traffic, exactly as you report. If you wish to protect against a switch as a single-point-of-failure, then the enterprise industry practice is to use multiple redundant switches. For your reference: Is bonding supported with direct connection using crossover cables? https://access.redhat.com/solutions/202583 If you have further queries, we look forward to hearing from you via a support case with your support entitlement, or on the Discussions community. Regards, Jamie Bainbridge Red Hat Asia Pacific |
Hi: At RHEL7/8 I use teamd with arping to create active-backup connection between two nodes. since teamd is depreciated, I switch to bonding at RHEL9 with similar configuration but it is not working correctly. the two nodes are connect directly with a 10G and a 1G nic. I create a active-backup bonding, and set 10g as the primary interface. NetworkManager configuration(/etc/NetworkManager/system-connections/bond99.nmconnection) like below: node-A: [connection] id=bond99 type=bond interface-name=bond99 [bond] arp_interval=200 arp_ip_target=10.255.99.2 mode=active-backup primary=enp1s0 [ipv4] address1=10.255.99.1/24 method=manual [ipv6] method=disabled [802-3-ethernet] mtu=9000 node-B: [connection] id=bond99 type=bond interface-name=bond99 [bond] arp_interval=200 arp_ip_target=10.255.99.1 mode=active-backup primary=enp1s0 [ipv4] address1=10.255.99.2/24 method=manual [ipv6] method=disabled [802-3-ethernet] mtu=9000 interface enp1s0 is 10G link but it is not always primary. when it is not primary if I issue a command at node-A manually: >arping 10.255.99.2 -I enp1s0 then enp1s0 will become primary suddenly. if I don't use arping but miimon to do active-backup, then everything is fine. the primary selection and recover is working as expected. so configuration below works fine: node-A: [connection] id=bond99 type=bond interface-name=bond99 [bond] miimon=100 mode=active-backup primary=enp1s0 [ipv4] address1=10.255.99.1/24 method=manual [ipv6] method=disabled [802-3-ethernet] mtu=9000