Description of problem: The preferred primary setting is lost when you bring down the primary slave. This preference is not restored after you bring it back up. Version-Release number of selected component (if applicable): How reproducible: By just resetting the primary slave. Steps to Reproduce: 1. Set up one of the slave as the primary slave 2. cat /proc/net/bonding/bond? to see that it is set. 3. ifdown primary slave 4. ifup primary slave 5. The preferred setting is not restored. 6. An ifdown and ifup of the bond will restore the setting. Actual results: The preferred setting is lost. Expected results: The preferred setting should be restored once the primary slave cam back up. Additional info:
While this may be a bug, the 'primary' feature is designed to be used when an interface actually goes down due to a link failure not simply adding it and removing it from the bond. Can you confirm that still works?
Which bonding mode do you use?
(In reply to comment #1) > While this may be a bug, the 'primary' feature is designed to be used when an > interface actually goes down due to a link failure not simply adding it and > removing it from the bond. Can you confirm that still works? I disagree here. The 'primary' option is set once (at the module load) and it should be looked at in every time. Even if the interface is temporary removed from the bond and added back in. Actually when you unplug the primary interface cable and plug it back then active slave is switched to primary interface. This issue appears only in tlb and alb mode (not in active-backup). Following upstream patch fixes this issue: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=5a29f7893fbe681f1334285be7e41e56f0de666c
What I found is that the code in bond_enslave() in bond_main.c if (USES_PRIMARY(bond->params.mode) && bond->params.primary[0]) { /* if there is a primary slave, remember it */ if (strcmp(bond->params.primary, new_slave->dev->name) == 0) { bond->primary_slave = new_slave; } } is a noop because bond->params.primary[0] is always zero. bond->params.primary is not set at all.
(In reply to comment #4) > What I found is that the code in bond_enslave() in bond_main.c > > if (USES_PRIMARY(bond->params.mode) && bond->params.primary[0]) { > /* if there is a primary slave, remember it */ > if (strcmp(bond->params.primary, new_slave->dev->name) == 0) { > bond->primary_slave = new_slave; > } > } > > is a noop because bond->params.primary[0] is always zero. Not true. It's true only when no primary slave is specified. > > bond->params.primary is not set at all. No, no. See bond_check_params(). No this is not the problem. Problem is described in comment #3.
in kernel-2.6.18-165.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5 Please do NOT transition this bugzilla state to VERIFIED until our QE team has sent specific instructions indicating when to do so. However feel free to provide a comment indicating that this fix has been verified.
setting back to assigned because this patch didn't solve the problem.
Test with kernel 2.6.18-164.1.1.el5, preferred primary setting still lost. [root@nec-em20 ~]# uname -a Linux nec-em20.rhts.bos.redhat.com 2.6.18-164.1.1.el5 #1 SMP Mon Sep 7 06:13:28 EDT 2009 i686 i686 i386 GNU/Linux [root@nec-em20 ~]# cat /proc/net/bonding/bond0 Ethernet Channel Bonding Driver: v3.4.0 (October 7, 2008) Bonding Mode: adaptive load balancing Primary Slave: eth0 Currently Active Slave: eth0 MII Status: up MII Polling Interval (ms): 100 Up Delay (ms): 0 Down Delay (ms): 0 Slave Interface: eth0 MII Status: up Link Failure Count: 0 Permanent HW addr: 00:15:17:14:2b:c6 Slave Interface: eth1 MII Status: up Link Failure Count: 0 Permanent HW addr: 00:15:17:14:2b:c7 Slave Interface: eth3 MII Status: up Link Failure Count: 0 Permanent HW addr: 00:19:db:2f:93:7f [root@nec-em20 ~]# ifdown eth0 [root@nec-em20 ~]# ifup eth0 [root@nec-em20 ~]# cat /proc/net/bonding/bond0 Ethernet Channel Bonding Driver: v3.4.0 (October 7, 2008) Bonding Mode: adaptive load balancing Primary Slave: None <========== Currently Active Slave: eth1 MII Status: up MII Polling Interval (ms): 100 Up Delay (ms): 0 Down Delay (ms): 0 Slave Interface: eth1 MII Status: up Link Failure Count: 0 Permanent HW addr: 00:15:17:14:2b:c7 Slave Interface: eth3 MII Status: up Link Failure Count: 0 Permanent HW addr: 00:19:db:2f:93:7f Slave Interface: eth0 MII Status: up Link Failure Count: 0 Permanent HW addr: 00:15:17:14:2b:c6 4 NICs with e1000e driver: 01:00.0 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (rev 06) 01:00.1 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (rev 06) 02:00.0 Ethernet controller: Intel Corporation 82573L Gigabit Ethernet Controller 03:00.0 Ethernet controller: Intel Corporation 82573L Gigabit Ethernet Controller bond config: bond0 with slaves eth0, eth1, eth3, mode is balance-alb.
in kernel-2.6.18-168.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5 Please do NOT transition this bugzilla state to VERIFIED until our QE team has sent specific instructions indicating when to do so. However feel free to provide a comment indicating that this fix has been verified.
*** Bug 524233 has been marked as a duplicate of this bug. ***
Any update on getting this into the 5.4 kernel? My customer is running with mode 6 and just discovered this issue.
(In reply to comment #15) > Any update on getting this into the 5.4 kernel? My customer is running with > mode 6 and just discovered this issue. See bz5517971. This was addressed in 5.4.z kernel 2.6.18-164.4.1.el.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2010-0178.html