Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
For bugs related to Red Hat Enterprise Linux 5 product line. The current stable release is 5.10. For Red Hat Enterprise Linux 6 and above, please visit Red Hat JIRA https://issues.redhat.com/secure/CreateIssue!default.jspa?pid=12332745 to report new issues.

Bug 204795

Summary: Recursive locking on the bonding driver in balance-xor mode.
Product: Red Hat Enterprise Linux 5 Reporter: Sandeep K. Shandilya <sandeep_k_shandilya>
Component: kernelAssignee: John W. Linville <linville>
Status: CLOSED CURRENTRELEASE QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 5.0CC: agospoda, davej, jarod, wtogami, wwlinuxengineering
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: kernel-2.6.18-1.2702 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-11-09 12:45:45 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 200812, 202141    
Attachments:
Description Flags
backtrace and dmesg output.
none
backtrace and dmesg output. none

Description Sandeep K. Shandilya 2006-08-31 16:46:54 UTC
Description of problem:
Configured bonding (balance-xor) with 3 nics (tg3 driver).
When the bonding driver loads at boot a recursive locking is detected.
The messages are attached to this issue.

line from modprobe.conf for this
options bond0 mode=balance-xor miimon=100 use_carrier=1



Version-Release number of selected component (if applicable):
2.6.17-1.2600-smp

How reproducible:
Only in the smp kernel

Steps to Reproduce:
1. configure bonding balance-xor
2. reboot the machine
3. The output of dmesg will have this.
The server is PowerEdge 6800 with 2 procs
one 5701 adapter and two 5704 LOM. All nics are broadcom

Actual results:
when the bonding driver loads the kernel outputs
possible recursive locking detected and bonding drivers also report
duplicate MAC address.

Expected results:
There should be no recursive locking.

Additional info:
This does not happen with the xen kernel, but another problem that occurs is
that, in the same configuration the nics are enumarated as eth1 eth2 and eth3.
eth0 seems to be already used by another device.
mii-tool eth0 reports "SIOCGMIIPHY on 'eth0' failed: Operation not supported"

Comment 1 David Lawrence 2006-09-05 15:13:01 UTC
Reassigning to correct owner, kernel-maint.

Comment 2 Samuel Benjamin 2006-09-14 00:10:41 UTC
This bug is on Dell's weekly watch list. Please assign to developer to
investigate. Thanks.

Comment 3 Sandeep K. Shandilya 2006-09-18 18:49:24 UTC
Created attachment 136568 [details]
backtrace and dmesg output.

This is the output of dmesg and also the back trace of the issue.

Comment 4 Sandeep K. Shandilya 2006-09-18 18:50:49 UTC
Created attachment 136570 [details]
backtrace and dmesg output.

This is the output of dmesg and also the back trace of the issue.

Comment 5 Jarod Wilson 2006-09-25 18:00:16 UTC
I'm seeing a recursive locking message with a bonded interface as well. One of
the two NICs is tg3, the other is ns83820 (both GbE), but the bonding mode is
active-backup.

Comment 6 Jarod Wilson 2006-09-25 20:28:50 UTC
Okay, this doesn't appear to be specific to the NIC driver. I'm getting the same
thing with dual 3c59x cards.

Comment 8 Larry Troan 2006-09-28 17:51:40 UTC
This was posted against fc6. Changed to RHEL5 beta per sly. 
This is a BUG.

Comment 10 Don Zickus 2006-09-28 22:31:49 UTC
in kernel-2.6.18-1.2707.el5.bz208456

Comment 11 John W. Linville 2006-09-28 23:04:46 UTC
FC6 test kernels available here:

   http://people.redhat.com/linville/kernels/fc6/

Please try to replicate the issue with those kernels, and post the results 
here...thanks!

Comment 13 Jarod Wilson 2006-09-29 19:40:51 UTC
(In reply to comment #11)
> FC6 test kernels available here:
> 
>    http://people.redhat.com/linville/kernels/fc6/
> 
> Please try to replicate the issue with those kernels, and post the results 
> here...thanks!

I see no more lockdep spew with 2.6.18-1.2708.2.1.fc6.jwltest.9.i686.

Comment 14 Amit Bhutani 2006-10-03 00:02:25 UTC
Sandeep- Please try and regress with the people page kernel (see comment #11) 
and report results  here so we can call this issue RIP!


Comment 17 RHEL Program Management 2006-10-09 18:46:48 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux release.  Product Management has requested further review
of this request by Red Hat Engineering.  This request is not yet committed for
inclusion in release.

Comment 18 Jay Turner 2006-10-09 20:34:24 UTC
QE ack for RHEL5B2.

Comment 19 Sandeep K. Shandilya 2006-10-10 05:06:04 UTC
(In reply to comment #11)
> FC6 test kernels available here:
> 
>    http://people.redhat.com/linville/kernels/fc6/
> 
> Please try to replicate the issue with those kernels, and post the results 
> here...thanks!
I did a test and found that this issue does not occur on kernel-2.6.18-1.2702
weekly RHEL 5 build 18th sept.


Comment 20 Amit Bhutani 2006-11-03 06:19:23 UTC
Fix verified. Please close. Thanks!