499884 – A bond's preferred primary setting is lost after bringing down and up of the primary slave.

Bug 499884 - A bond's preferred primary setting is lost after bringing down and up of the primary slave.

Summary: A bond's preferred primary setting is lost after bringing down and up of the ...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	5.3
Hardware:	All
OS:	Linux
Priority:	urgent
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Jiri Pirko
QA Contact:	Red Hat Kernel QE team
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	524233 (view as bug list)
Depends On:
Blocks:	500637 517971
TreeView+	depends on / blocked

Reported:	2009-05-08 18:25 UTC by Oonkwee Lim
Modified:	2015-05-05 01:16 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	500637 (view as bug list)
Environment:
Last Closed:	2010-03-30 06:54:26 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2010:0178	0	normal	SHIPPED_LIVE	Important: Red Hat Enterprise Linux 5.5 kernel security and bug fix update	2010-03-29 12:18:21 UTC

Description Oonkwee Lim 2009-05-08 18:25:56 UTC

Description of problem:
The preferred primary setting is lost when you bring down the primary slave.  This preference is not restored after you bring it back up.

Version-Release number of selected component (if applicable):


How reproducible:
By just resetting the primary slave.

Steps to Reproduce:
1. Set up one of the slave as the primary slave
2. cat /proc/net/bonding/bond? to see that it is set.
3. ifdown primary slave
4. ifup primary slave
5.  The preferred setting is not restored.
6. An ifdown and ifup of the bond will restore the setting.
  
Actual results:
The preferred setting is lost.

Expected results:
The preferred setting should be restored once the primary slave cam back up.

Additional info:

Comment 1 Andy Gospodarek 2009-05-12 16:01:10 UTC

While this may be a bug, the 'primary' feature is designed to be used when an interface actually goes down due to a link failure not simply adding it and removing it from the bond.  Can you confirm that still works?

Comment 2 Michal Schmidt 2009-05-13 10:28:34 UTC

Which bonding mode do you use?

Comment 3 Jiri Pirko 2009-05-13 14:07:36 UTC

(In reply to comment #1)
> While this may be a bug, the 'primary' feature is designed to be used when an
> interface actually goes down due to a link failure not simply adding it and
> removing it from the bond.  Can you confirm that still works?  
I disagree here. The 'primary' option is set once (at the module load) and it should be looked at in every time. Even if the interface is temporary removed from the bond and added back in. Actually when you unplug the primary interface cable and plug it back then active slave is switched to primary interface. This issue appears only in tlb and alb mode (not in active-backup). Following upstream patch fixes this issue:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=5a29f7893fbe681f1334285be7e41e56f0de666c

Comment 4 Oonkwee Lim 2009-05-13 15:58:37 UTC

What I found is that the code in bond_enslave() in bond_main.c

        if (USES_PRIMARY(bond->params.mode) && bond->params.primary[0]) {
                /* if there is a primary slave, remember it */
                if (strcmp(bond->params.primary, new_slave->dev->name) == 0) {
                        bond->primary_slave = new_slave;
                }
        }

is a noop because bond->params.primary[0] is always zero.

bond->params.primary is not set at all.

Comment 5 Jiri Pirko 2009-05-13 16:30:23 UTC

(In reply to comment #4)
> What I found is that the code in bond_enslave() in bond_main.c
> 
>         if (USES_PRIMARY(bond->params.mode) && bond->params.primary[0]) {
>                 /* if there is a primary slave, remember it */
>                 if (strcmp(bond->params.primary, new_slave->dev->name) == 0) {
>                         bond->primary_slave = new_slave;
>                 }
>         }
> 
> is a noop because bond->params.primary[0] is always zero.
Not true. It's true only when no primary slave is specified.
> 
> bond->params.primary is not set at all. 
No, no. See bond_check_params().

No this is not the problem. Problem is described in comment #3.

Comment 8 Don Zickus 2009-09-04 18:45:13 UTC

in kernel-2.6.18-165.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Please do NOT transition this bugzilla state to VERIFIED until our QE team
has sent specific instructions indicating when to do so.  However feel free
to provide a comment indicating that this fix has been verified.

Comment 9 Jiri Pirko 2009-09-16 11:20:42 UTC

setting back to assigned because this patch didn't solve the problem.

Comment 10 Hushan Jia 2009-09-16 11:42:30 UTC

Test with kernel 2.6.18-164.1.1.el5, preferred primary setting still lost.

[root@nec-em20 ~]# uname -a
Linux nec-em20.rhts.bos.redhat.com 2.6.18-164.1.1.el5 #1 SMP Mon Sep 7 06:13:28
EDT 2009 i686 i686 i386 GNU/Linux

[root@nec-em20 ~]# cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.4.0 (October 7, 2008)

Bonding Mode: adaptive load balancing
Primary Slave: eth0
Currently Active Slave: eth0
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

Slave Interface: eth0
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:15:17:14:2b:c6

Slave Interface: eth1
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:15:17:14:2b:c7

Slave Interface: eth3
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:19:db:2f:93:7f
[root@nec-em20 ~]# ifdown eth0
[root@nec-em20 ~]# ifup eth0
[root@nec-em20 ~]# cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.4.0 (October 7, 2008)

Bonding Mode: adaptive load balancing
Primary Slave: None  <==========
Currently Active Slave: eth1
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

Slave Interface: eth1
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:15:17:14:2b:c7

Slave Interface: eth3
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:19:db:2f:93:7f

Slave Interface: eth0
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:15:17:14:2b:c6

4 NICs with e1000e driver:
01:00.0 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (rev 06)
01:00.1 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (rev 06)
02:00.0 Ethernet controller: Intel Corporation 82573L Gigabit Ethernet Controller
03:00.0 Ethernet controller: Intel Corporation 82573L Gigabit Ethernet Controller

bond config:
bond0 with slaves eth0, eth1, eth3, mode is balance-alb.

Comment 12 Don Zickus 2009-10-06 19:37:22 UTC

in kernel-2.6.18-168.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Please do NOT transition this bugzilla state to VERIFIED until our QE team
has sent specific instructions indicating when to do so.  However feel free
to provide a comment indicating that this fix has been verified.

Comment 14 Andy Gospodarek 2009-11-30 16:28:59 UTC

*** Bug 524233 has been marked as a duplicate of this bug. ***

Comment 15 Bobby Shepherd 2010-01-06 18:31:41 UTC

Any update on getting this into the 5.4 kernel?  My customer is running with mode 6 and just discovered this issue.

Comment 16 Don Howard 2010-01-06 18:48:30 UTC

(In reply to comment #15)
> Any update on getting this into the 5.4 kernel?  My customer is running with
> mode 6 and just discovered this issue.  

See bz5517971.  This was addressed in 5.4.z kernel 2.6.18-164.4.1.el.

Comment 19 errata-xmlrpc 2010-03-30 06:54:26 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2010-0178.html

Note You need to log in before you can comment on or make changes to this bug.