Bug 251902

Summary: bonding 802.3ad does not work
Product: Red Hat Enterprise Linux 5 Reporter: Dirk Nehring <dnehring>
Component: kernelAssignee: Andy Gospodarek <agospoda>
Status: CLOSED ERRATA QA Contact: Martin Jenner <mjenner>
Severity: medium Docs Contact:
Priority: low    
Version: 5.0CC: daniel.black, jeremiah.johnson, narendra_k, peterm, tao, wwlinuxengineering
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: RHBA-2008-0314 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-05-21 14:49:28 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 425461, 445799    

Description Dirk Nehring 2007-08-13 11:29:40 UTC
Description of problem:
When setting up a 802.3ad bonding connection, I cannot see my partner.

Version-Release number of selected component (if applicable):


How reproducible:
/etc/modprobe.conf:
alias bond0 bonding
options bond0 miimon=100 mode=802.3ad

On our 3com switch, I have enabled LACP for the relevant ports, but i do not see
any partner in /proc/net/bonding/bond0

Solution:

The bug is fixed in 2.6.22:
bonding: Fix 802.3ad no carrier on "no partner found" instance

Comment 1 Dirk Nehring 2007-08-13 11:45:22 UTC
I test 2.6.22 successfully. Here is the extracted fix:

http://www.mail-archive.com/netdev@vger.kernel.org/msg40353.html

Comment 2 Andy Gospodarek 2007-09-13 15:14:31 UTC
Interesting.  We will include this fix in an upcoming update.  Feel free to test
any of the kernels here:

http://people.redhat.com/agospoda/#rhel5

for a preview of what will be included in our upcoming update.  The patch
mentioned in comment #1 is not included, but many other updates are.


Comment 4 RHEL Program Management 2007-10-30 15:35:24 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 5 Dirk Nehring 2007-10-30 19:08:05 UTC
Trunking is a very helpful feature and easy to apply. When you will add a
iscsi-target to the new release, I think it is more than necessary. We use
currently a self-compiled kernel since months with 802.3ad support enabled.

Comment 6 Andy Gospodarek 2007-11-20 14:08:33 UTC
My test kernels have been updated to include a patch for this bugzilla.

http://people.redhat.com/agospoda/#rhel5

Please test them and report back your results.

Comment 7 Dirk Nehring 2007-11-28 17:12:00 UTC
I just verified the new kernel, 802.3ad (link aggregation) works now. Verified 
with 3com gbit switch.

Comment 8 Andy Gospodarek 2007-11-28 19:03:35 UTC
Great\!  Thanks for the feedback\!

Comment 9 Andy Gospodarek 2007-12-14 14:15:36 UTC
My test kernels have been updated to include a patch for this bugzilla.

http://people.redhat.com/agospoda/#rhel5

Please test them and report back your results.

Comment 10 Dirk Nehring 2007-12-14 14:25:15 UTC
Hmmh, the last version works, I see no need to test it again. Perhaps anyone 
other will test it?

Comment 11 Andy Gospodarek 2007-12-14 15:21:12 UTC
I'm glad to hear the last set of kernels worked.  I added a few more patches to this set to correct a few sysfs problems as well as problems that occurred when bringing the interface down.  If you can manage putting it on one of your systems to make sure these are no worse than the last ones I would appreciate it.

Comment 14 Don Zickus 2007-12-21 20:17:34 UTC
in 2.6.18-62.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Comment 15 Dirk Nehring 2007-12-28 14:09:51 UTC
We just tested kernel-2.6.18-62.el5.i686.rpm. We confirm that it works with 
802.3ad. You can close the case.

Comment 16 Andy Gospodarek 2008-01-30 18:20:16 UTC
*** Bug 241719 has been marked as a duplicate of this bug. ***

Comment 18 Andy Gospodarek 2008-03-10 17:25:39 UTC
*** Bug 435249 has been marked as a duplicate of this bug. ***

Comment 19 Dirk Nehring 2008-03-10 17:44:37 UTC
Is this fix now accepted for the next kernel release?

Regards,

Dirk Nehring

Comment 20 Andy Gospodarek 2008-03-10 17:54:08 UTC
yes it will be in the next update

Comment 21 Narendra K 2008-03-26 07:03:05 UTC
With respect to comment #18 , i tested bonding on RHEL 5.2 beta kernel 
version: 2.6.18-85.el5 and bonding driver version: 3.2.4. Call traces are not 
seen in dmesg when bonding is started in balance-alb mode.

Comment 23 errata-xmlrpc 2008-05-21 14:49:28 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2008-0314.html


Comment 25 daniel 2009-02-05 03:32:18 UTC
Any chance of getting this fix in the xen kernels? Below is the backtrace from  2.6.18-128.el5xen

Feb  5 13:08:21 pearlygates kernel: Pid: 3696, comm: ip Not tainted 2.6.18-128.el5xen #1
Feb  5 13:08:21 pearlygates kernel: RIP: e030:[<ffffffff80261157>]  [<ffffffff80261157>] __write_lock_failed+0xf/0x20
Feb  5 13:08:21 pearlygates kernel: RSP: e02b:ffff8801d30c9d70  EFLAGS: 00000206
Feb  5 13:08:21 pearlygates kernel: RAX: ffff8801d30c9fd8 RBX: ffff8801d6bc8530 RCX: 0000000000000000
Feb  5 13:08:21 pearlygates kernel: RDX: 0000000000000180 RSI: 000000000000004c RDI: ffff8801d6bc8530
Feb  5 13:08:21 pearlygates kernel: RBP: 0000000000000000 R08: ffff8801def49ac0 R09: ffff8801dbed8600
Feb  5 13:08:21 pearlygates kernel: R10: ffff8801dbed9c80 R11: ffffffff881dd594 R12: 0000000000000000
Feb  5 13:08:21 pearlygates kernel: R13: ffff8801d6bc81a8 R14: 0000000000000002 R15: ffff8801d6bc8000
Feb  5 13:08:21 pearlygates kernel: FS:  00002b05ac9057f0(0000) GS:ffffffff805ba300(0000) knlGS:0000000000000000
Feb  5 13:08:21 pearlygates kernel: CS:  e033 DS: 0000 ES: 0000
Feb  5 13:08:21 pearlygates kernel: 
Feb  5 13:08:21 pearlygates kernel: Call Trace:
Feb  5 13:08:21 pearlygates kernel:  [<ffffffff80263b01>] _write_lock_bh+0x1a/0x1c
Feb  5 13:08:21 pearlygates kernel:  [<ffffffff8842f972>] :bonding:bond_alb_set_mac_address+0x23a/0x254
Feb  5 13:08:21 pearlygates kernel:  [<ffffffff8040fd76>] dev_set_mac_address+0x38/0x58
Feb  5 13:08:21 pearlygates kernel:  [<ffffffff804119de>] dev_ioctl+0x386/0x465
Feb  5 13:08:21 pearlygates kernel:  [<ffffffff80316453>] inode_has_perm+0x56/0x63
Feb  5 13:08:21 pearlygates kernel:  [<ffffffff8040897b>] sock_ioctl+0x1d4/0x1e5
Feb  5 13:08:21 pearlygates kernel:  [<ffffffff802437ba>] do_ioctl+0x21/0x6b
Feb  5 13:08:21 pearlygates kernel:  [<ffffffff80231010>] vfs_ioctl+0x248/0x261
Feb  5 13:08:21 pearlygates kernel:  [<ffffffff8024ddde>] sys_ioctl+0x59/0x78
Feb  5 13:08:21 pearlygates kernel:  [<ffffffff8025f2f9>] tracesys+0xab/0xb6

Comment 26 Andy Gospodarek 2009-02-05 17:24:00 UTC
(In reply to comment #25)
> Any chance of getting this fix in the xen kernels? Below is the backtrace from 
> 2.6.18-128.el5xen
> 
> Feb  5 13:08:21 pearlygates kernel: Pid: 3696, comm: ip Not tainted
> 2.6.18-128.el5xen #1
> Feb  5 13:08:21 pearlygates kernel: RIP: e030:[<ffffffff80261157>] 
> [<ffffffff80261157>] __write_lock_failed+0xf/0x20
> Feb  5 13:08:21 pearlygates kernel: RSP: e02b:ffff8801d30c9d70  EFLAGS:
> 00000206
> Feb  5 13:08:21 pearlygates kernel: RAX: ffff8801d30c9fd8 RBX: ffff8801d6bc8530
> RCX: 0000000000000000
> Feb  5 13:08:21 pearlygates kernel: RDX: 0000000000000180 RSI: 000000000000004c
> RDI: ffff8801d6bc8530
> Feb  5 13:08:21 pearlygates kernel: RBP: 0000000000000000 R08: ffff8801def49ac0
> R09: ffff8801dbed8600
> Feb  5 13:08:21 pearlygates kernel: R10: ffff8801dbed9c80 R11: ffffffff881dd594
> R12: 0000000000000000
> Feb  5 13:08:21 pearlygates kernel: R13: ffff8801d6bc81a8 R14: 0000000000000002
> R15: ffff8801d6bc8000
> Feb  5 13:08:21 pearlygates kernel: FS:  00002b05ac9057f0(0000)
> GS:ffffffff805ba300(0000) knlGS:0000000000000000
> Feb  5 13:08:21 pearlygates kernel: CS:  e033 DS: 0000 ES: 0000
> Feb  5 13:08:21 pearlygates kernel: 
> Feb  5 13:08:21 pearlygates kernel: Call Trace:
> Feb  5 13:08:21 pearlygates kernel:  [<ffffffff80263b01>]
> _write_lock_bh+0x1a/0x1c
> Feb  5 13:08:21 pearlygates kernel:  [<ffffffff8842f972>]
> :bonding:bond_alb_set_mac_address+0x23a/0x254
> Feb  5 13:08:21 pearlygates kernel:  [<ffffffff8040fd76>]
> dev_set_mac_address+0x38/0x58
> Feb  5 13:08:21 pearlygates kernel:  [<ffffffff804119de>] dev_ioctl+0x386/0x465
> Feb  5 13:08:21 pearlygates kernel:  [<ffffffff80316453>]
> inode_has_perm+0x56/0x63
> Feb  5 13:08:21 pearlygates kernel:  [<ffffffff8040897b>]
> sock_ioctl+0x1d4/0x1e5
> Feb  5 13:08:21 pearlygates kernel:  [<ffffffff802437ba>] do_ioctl+0x21/0x6b
> Feb  5 13:08:21 pearlygates kernel:  [<ffffffff80231010>] vfs_ioctl+0x248/0x261
> Feb  5 13:08:21 pearlygates kernel:  [<ffffffff8024ddde>] sys_ioctl+0x59/0x78
> Feb  5 13:08:21 pearlygates kernel:  [<ffffffff8025f2f9>] tracesys+0xab/0xb6


This backtrace appears to be related to a completely different mode of bonding (mode 5 or 6 rather than mode 4) since there is a reference to bond_alb_set_mac_address in the backtrace.

If you are experiencing problems with bonding and RHEL5.3, please go through the normal support channels or open a new bugzilla, provide the needed reproduction information and assign it to agospoda.

I will be happy to take a look at it, but it's quite difficult for me to understand the situation with only this backtrace and no other information about what caused this panic to occur.

Thanks!