RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2165802 - bonding active-backup priority with arping is not working
Summary: bonding active-backup priority with arping is not working
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: kernel
Version: 9.1
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: rc
: ---
Assignee: Jonathan Toppins
QA Contact: LiLiang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-01-31 05:54 UTC by tbsky
Modified: 2023-02-23 23:35 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-02-23 23:35:37 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
network recreate (3.17 KB, application/x-shellscript)
2023-02-22 17:24 UTC, Jonathan Toppins
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHELPLAN-146988 0 None None None 2023-01-31 05:55:38 UTC

Description tbsky 2023-01-31 05:54:32 UTC
Hi:
   At RHEL7/8 I use teamd with arping to create active-backup connection between two nodes. since teamd is depreciated, I switch to bonding at RHEL9 with similar configuration but it is not working correctly.

the two nodes are connect directly with a 10G and a 1G nic. I create a active-backup bonding, and set 10g as the primary interface. NetworkManager configuration(/etc/NetworkManager/system-connections/bond99.nmconnection) like below:

node-A:
[connection]
id=bond99
type=bond
interface-name=bond99
[bond]
arp_interval=200
arp_ip_target=10.255.99.2
mode=active-backup
primary=enp1s0
[ipv4]
address1=10.255.99.1/24
method=manual
[ipv6]
method=disabled
[802-3-ethernet]
mtu=9000


node-B:
[connection]
id=bond99
type=bond
interface-name=bond99
[bond]
arp_interval=200
arp_ip_target=10.255.99.1
mode=active-backup
primary=enp1s0
[ipv4]
address1=10.255.99.2/24
method=manual
[ipv6]
method=disabled
[802-3-ethernet]
mtu=9000


interface enp1s0 is 10G link but it is not always primary. when it is not primary if I issue a command at node-A manually:

>arping 10.255.99.2 -I enp1s0

then enp1s0 will become primary suddenly.

if I don't use arping but miimon to do active-backup, then everything is fine.
the primary selection and recover is working as expected. so configuration below works fine:

node-A:
[connection]
id=bond99
type=bond
interface-name=bond99
[bond]
miimon=100
mode=active-backup
primary=enp1s0
[ipv4]
address1=10.255.99.1/24
method=manual
[ipv6]
method=disabled
[802-3-ethernet]
mtu=9000

Comment 1 LiLiang 2023-01-31 07:09:36 UTC
I think this is because the non-primary slave will receive arp reply first sometimes, then it will become active slave.

If the primary slave receive arp reply first, it will be selected as the active slave.

As you are using two directly connected NICs, this issue have chance to happen in you environment.

I don't know why this issue doesn't happen with team...

This is just my guess..


             +------+
10G NIC------+      |
             |      |
             |  SW  +------arp_ip_target
             |      |
1G NIC-------+      |
             +------+

If you use the environment like this, can you reproduce this problem?

Comment 2 tbsky 2023-01-31 09:38:47 UTC
Hi:

   I don't have such environment to test. but I create two qemu VM, each have two nics and connect to the same bridge. the active-backup arping priority works correctly.
 
   the two nodes need to connect directly so it won't affect by failed switch. previously teamd works fine. but I guess I need to use miimon now since bonding arping is working fine across switch/bridge. so it seems a feature not a bug?

Comment 3 LiLiang 2023-01-31 09:50:23 UTC
(In reply to tbsky from comment #2)
> Hi:
> 
>    I don't have such environment to test. but I create two qemu VM, each
> have two nics and connect to the same bridge. the active-backup arping
> priority works correctly.
>  
>    the two nodes need to connect directly so it won't affect by failed
> switch. previously teamd works fine. but I guess I need to use miimon now
> since bonding arping is working fine across switch/bridge. so it seems a
> feature not a bug?

I can't confirm this. Let developer have a look too.

Comment 4 Jonathan Toppins 2023-01-31 22:06:22 UTC
Hello,

I have not gotten a chance to test a Fedora kernel or something really close to upstream. What I can say for now is bonding requires the use of either miimon or arpmon to manage the bond link state and in this case the fail-over process for active-backup mode. If neither of these monitors are used there are several cases where the bond interface would get stuck down when a member link changes state. In older RHELs (RHEL-7 for example) there was no default monitor selected now in upstream (I will have to verify when in RHEL-8 & -9) miimon is selected by default if no monitor selection is made during the creation of the bond. In Bonding there is no way to use an external process like arping to monitor the link state of a member port, the only monitoring options are bond member link state(miimon) or arpmon.

I will note, assuming I am understanding the description correctly, incoming L2 management traffic in this case an ARP Request causing the active bond member to switch sounds like a bug. It should not be possible for an external entity to cause a state change in bonding unless that entity was controlling the link state of a bond member. This is what I need to test upstream and clarify understanding about.

Hope this helps for now.
-Jon

Comment 5 tbsky 2023-02-01 01:38:36 UTC
hi:
   for testing the direct connection situation, I create two bridges for the two VM. so it connects like:
   


             +-------+
             |       |
     eth0 ---+  br0  +--- eth0
             |       |  
             +-------+
    
             
             +-------+
             |       |
     eth1 ---+  br1  +--- eth1
             |       |  
             +-------+


   the behavior is like physical direct connection nodes. fault-tolerance still works, but the priority selection/recovery is not. and arping -I xxxx ethx will help the bonding to find the correct priority link.

Comment 6 Jonathan Toppins 2023-02-06 14:47:26 UTC
tbsky,

To clarify, it appears in the original description you are saying arp monitor is not working for you?

May I assume your logical network looks something like?

     Node A                                 Node B
 10.255.99.1/24                        10.255.99.2/24
/---------------\                     /---------------\
|               |      /-------\      |               |
|          /----| L1   |SWITCH1|   L4 |----\          |
|          |    +------+       +------+    |          |
|          | B  |      |       |      | B  |          |
|          | O  |      \-------/      | O  |          |
|          | N  |                     | N  |          |
|          | D  |      /-------\      | D  |          |
|          | 0  |      |       |      | 0  |          |
|          |    +------+       +------+    |          |
|          \----| L2   |SWITCH2|   L3 |----/          |
|               |      \-------/      |               |
\---------------/                     \---------------/


Providing an `ip -d link show` for each VM would help clarify how things are connected.

Comment 7 tbsky 2023-02-07 07:59:27 UTC
Hi:
   yes my current configuration at the vm is just like what you draw.

below are "ip -d link show" result at two nodes:

nodeA(10.255.99.1)
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 promiscuity 0 minmtu 0 maxmtu 0 addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
2: eth0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc fq_codel master bond99 state UP mode DEFAULT group default qlen 1000
    link/ether 52:54:00:59:78:21 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
    bond_slave state BACKUP mii_status DOWN link_failure_count 1 perm_hwaddr 52:54:00:59:78:21 queue_id 0 addrgenmode none numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65
535 parentbus virtio parentdev virtio0
    altname enp1s0
3: eth1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc fq_codel master bond99 state UP mode DEFAULT group default qlen 1000
    link/ether 52:54:00:59:78:21 brd ff:ff:ff:ff:ff:ff permaddr 52:54:00:81:7d:11 promiscuity 0 minmtu 68 maxmtu 65535
    bond_slave state ACTIVE mii_status UP link_failure_count 0 perm_hwaddr 52:54:00:81:7d:11 queue_id 0 addrgenmode none numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 6553
5 parentbus virtio parentdev virtio5
    altname enp7s0
4: bond99: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 52:54:00:59:78:21 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
    bond mode active-backup active_slave eth1 miimon 0 updelay 0 downdelay 0 peer_notify_delay 0 use_carrier 1 arp_interval 200 arp_missed_max 2 arp_ip_target 10.255.99.2 arp_validate n
one arp_all_targets any primary eth1 primary_reselect always fail_over_mac none xmit_hash_policy layer2 resend_igmp 1 num_grat_arp 1 all_slaves_active 0 min_links 0 lp_interval 1 packet
s_per_slave 1 lacp_active on lacp_rate slow ad_select stable tlb_dynamic_lb 1 addrgenmode none numtxqueues 16 numrxqueues 16 gso_max_size 65536 gso_max_segs 65535



nodeB(10.255.99.2)
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 promiscuity 0 minmtu 0 maxmtu 0 addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
2: eth0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc fq_codel master bond99 state UP mode DEFAULT group default qlen 1000
    link/ether 52:54:00:42:e0:b7 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
    bond_slave state BACKUP mii_status GOING_DOWN link_failure_count 1 perm_hwaddr 52:54:00:42:e0:b7 queue_id 0 addrgenmode none numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_s
egs 65535 parentbus virtio parentdev virtio0
    altname enp1s0
3: eth1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc fq_codel master bond99 state UP mode DEFAULT group default qlen 1000
    link/ether 52:54:00:42:e0:b7 brd ff:ff:ff:ff:ff:ff permaddr 52:54:00:23:6c:9d promiscuity 0 minmtu 68 maxmtu 65535
    bond_slave state ACTIVE mii_status UP link_failure_count 1 perm_hwaddr 52:54:00:23:6c:9d queue_id 0 addrgenmode none numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 6553
5 parentbus virtio parentdev virtio8
    altname enp10s0
4: bond99: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 52:54:00:42:e0:b7 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
    bond mode active-backup active_slave eth1 miimon 0 updelay 0 downdelay 0 peer_notify_delay 0 use_carrier 1 arp_interval 200 arp_missed_max 2 arp_ip_target 10.255.99.1 arp_validate n
one arp_all_targets any primary eth1 primary_reselect always fail_over_mac none xmit_hash_policy layer2 resend_igmp 1 num_grat_arp 1 all_slaves_active 0 min_links 0 lp_interval 1 packet
s_per_slave 1 lacp_active on lacp_rate slow ad_select stable tlb_dynamic_lb 1 addrgenmode none numtxqueues 16 numrxqueues 16 gso_max_size 65536 gso_max_segs 65535

Comment 8 Jonathan Toppins 2023-02-22 17:24:57 UTC
Created attachment 1945773 [details]
network recreate

This recreate is not able to reproduce the specific issue the reporter is having.

Comment 9 Jonathan Toppins 2023-02-22 17:59:58 UTC
tbsky,

The interface names dumped in comment #7 do not match the interface names used in the original description, so I am going to assume `eth1` is equivalent to `enp1s0`.

I would recommend the following changes to the bond configuration.

1. set `active_slave` to `eth1`
This will set eth1 as the active slave regardless of the order network manager adds slaves. By default bonding attempts to use the first slave added to the bond as the active slave.

2. set `arp_validate` to `active`
Given your network configuration and wanting to ping past the first switch.
From the kernel bonding documentation:
        Enabling validation causes the ARP monitor to examine the incoming
        ARP requests and replies, and only consider a slave to be up if it
        is receiving the appropriate ARP traffic.

        For an active slave, the validation checks ARP replies to confirm
        that they were generated by an arp_ip_target.  Since backup slaves
        do not typically receive these replies, the validation performed
        for backup slaves is on the broadcast ARP request sent out via the
        active slave.  It is possible that some switch or network
        configurations may result in situations wherein the backup slaves
        do not receive the ARP requests; in such a situation, validation
        of backup slaves must be disabled.

        The validation of ARP requests on backup slaves is mainly helping
        bonding to decide which slaves are more likely to work in case of
        the active slave failure, it doesn't really guarantee that the
        backup slave will work if it's selected as the next active slave.

        Validation is useful in network configurations in which multiple
        bonding hosts are concurrently issuing ARPs to one or more targets
        beyond a common switch.  Should the link between the switch and
        target fail (but not the switch itself), the probe traffic
        generated by the multiple bonding instances will fool the standard
        ARP monitor into considering the links as still up.  Use of
        validation can resolve this, as the ARP monitor will only consider
        ARP requests and replies associated with its own instance of
        bonding.

3. You can try setting `primary_reselect` to `better`
Given the bandwidth between the two slaves is different I am assuming you want to prefer the 10G link over the 1G link. This will only force slave reselection when a better slave is available.

Comment 10 tbsky 2023-02-23 06:22:27 UTC
Hi:
   the original description was on production physical machines, which can not often put to the testing. so I create two virtual machines which can reproduce the same behavior. the VMs are in comment #2 then comment #5. the testing command/arch are now based on VMs at comment #5.

   I tried "arp_validate=active" but the situation is the same. 

   I can not test "primary_reselect=better" because there is no nic speed at qemu vm. and since I already know I always want eth1 as long as it is alive so I think maybe I don't need it to judge which one is better. 

   "active_slave=eth1" will change to the real active nic when bonding status change. so I don't know why I need it. maybe it will help when initializing bonding. but under vm the primary nic selection is always correct after reboot. the problem is when I force the primary nic down and up again (by down and up the virtual nic link at virt-manager), it won't be selected again. unless I issue command:

   arping -I eth1 10.255.99.2  

  as you said, this is the strange part.

BTW: I was hoping I can use miimon instead of arpmon. but after heavy testing (reboot two physical machines 200 times by script). I found some nic drivers sometimes may miss the real nic link status at booting (nic up/down frequently at the moment). when one machine think the nic link is down, but peer think the nic link is up, then the whole active-backup bonding is broken and unusable. under my testing arpmon won't let it happen, it will always make the bonding usable, although not the best link sometimes.

  so I will keep using arpmon and hope someday bonding will become usable as teamd.

Comment 11 Jonathan Toppins 2023-02-23 15:08:12 UTC
The virtio-net driver is not great for testing bonding because it doesn't present as a complete Ethernet device as it doesn't present a speed by default. LACP, active-backup, etc will not completely function properly in all cases.

Comment 12 Jonathan Toppins 2023-02-23 16:13:01 UTC
I have posted a network recreate script attempting to simulate the setup. The difference is the script is using net namespaces and veths. When link L1 is brought down, `ip netns exec switch0 ip link set dev eth0 down`, one will observe both bonds failover to the backup link. And when one brings back up L1, `ip netns exec switch0 ip link set dev eth0 up`, one will observe the bonds prefer the primary(eth0) and eth0 becomes the active link.

Miimon will not work for this particular network setup because the network does not have multiple paths to the hosts, Host A can only use path one or path two. Therefore, an active protocol such as arp monitor must be used otherwise one side (Host A) may detect a local link problem and fail over to eth1 but Host B will have no notification of this (all its links are fine) and so Host B will continue to use eth0 as its active port. One can observe this by modifying the bond creation to use miimon in the recreate script and then failing L1 as described above. Notice host A(ns1) fails over to eth1 but host B(ns2) is still using eth0. Neither host can talk to the other host because the active port on both hosts have no path to the other host.

To allow for the use of miimon one would need to create an inter-switch link(ISL) between the two switches. I would also recommend adding the bonding option `num_grat_arp 5` so the switches between the two hosts correctly learn the new path when the bond for either host fails over.

Comment 15 Jamie Bainbridge 2023-02-23 23:35:37 UTC
Hello tbsky,

My name is Jamie from Red Hat Asia Pacific. I am a senior support resource assisting with this issue.

Please note that Bugzilla is not a support channel. It is for reporting and tracking quantified bugs in Red Hat source code.

If you'd like Red Hat's assistance troubleshooting an issue, we're very happy to provide that service via our technical support entitlements.

I could not find your email address attached to a supported account (or any Customer Portal account). If you have a paid support entitlement, please do open a support case in future.

If you do not wish to purchase support, the free RHEL entitlement still provides you access to our knowledgebase, product documentation, and to our Discussions community where you can talk to other Red Hat software users and some employees.

You can search on our Customer Portal at: https://access.redhat.com/

You can open a community Discussions thread at: https://access.redhat.com/discussions/

For your reference:

 Can I get technical support using Bugzilla if I do not have a Red Hat support entitlement? 
 https://access.redhat.com/articles/1756

For this specific issue, you seem to be directly connecting your systems without a switch in between. As you said:

 (In reply to tbsky from comment #0)
 > the two nodes are connect directly with a 10G and a 1G nic.

 (In reply to tbsky from comment #2)
 > the two nodes need to connect directly so it won't affect by failed switch

Please note this is not a supported usage of bonding. We require a network switch in between systems.

One possible result of directly-connected or crossover bonding is incorrect failover state on each system resulting in no traffic, exactly as you report.

If you wish to protect against a switch as a single-point-of-failure, then the enterprise industry practice is to use multiple redundant switches.

For your reference:

 Is bonding supported with direct connection using crossover cables? 
 https://access.redhat.com/solutions/202583

If you have further queries, we look forward to hearing from you via a support case with your support entitlement, or on the Discussions community.

Regards,
Jamie Bainbridge
Red Hat Asia Pacific


Note You need to log in before you can comment on or make changes to this bug.