Bug 204795 - Recursive locking on the bonding driver in balance-xor mode.
Recursive locking on the bonding driver in balance-xor mode.
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
5.0
x86_64 Linux
medium Severity high
: ---
: ---
Assigned To: John W. Linville
Brian Brock
:
Depends On:
Blocks: 200812 FCMETA_LOCKDEP
  Show dependency treegraph
 
Reported: 2006-08-31 12:46 EDT by Sandeep K. Shandilya
Modified: 2009-06-19 04:54 EDT (History)
5 users (show)

See Also:
Fixed In Version: kernel-2.6.18-1.2702
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2006-11-09 07:45:45 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
backtrace and dmesg output. (8.72 KB, text/plain)
2006-09-18 14:49 EDT, Sandeep K. Shandilya
no flags Details
backtrace and dmesg output. (8.72 KB, text/plain)
2006-09-18 14:50 EDT, Sandeep K. Shandilya
no flags Details

  None (edit)
Description Sandeep K. Shandilya 2006-08-31 12:46:54 EDT
Description of problem:
Configured bonding (balance-xor) with 3 nics (tg3 driver).
When the bonding driver loads at boot a recursive locking is detected.
The messages are attached to this issue.

line from modprobe.conf for this
options bond0 mode=balance-xor miimon=100 use_carrier=1



Version-Release number of selected component (if applicable):
2.6.17-1.2600-smp

How reproducible:
Only in the smp kernel

Steps to Reproduce:
1. configure bonding balance-xor
2. reboot the machine
3. The output of dmesg will have this.
The server is PowerEdge 6800 with 2 procs
one 5701 adapter and two 5704 LOM. All nics are broadcom

Actual results:
when the bonding driver loads the kernel outputs
possible recursive locking detected and bonding drivers also report
duplicate MAC address.

Expected results:
There should be no recursive locking.

Additional info:
This does not happen with the xen kernel, but another problem that occurs is
that, in the same configuration the nics are enumarated as eth1 eth2 and eth3.
eth0 seems to be already used by another device.
mii-tool eth0 reports "SIOCGMIIPHY on 'eth0' failed: Operation not supported"
Comment 1 David Lawrence 2006-09-05 11:13:01 EDT
Reassigning to correct owner, kernel-maint.
Comment 2 Samuel Benjamin 2006-09-13 20:10:41 EDT
This bug is on Dell's weekly watch list. Please assign to developer to
investigate. Thanks.
Comment 3 Sandeep K. Shandilya 2006-09-18 14:49:24 EDT
Created attachment 136568 [details]
backtrace and dmesg output.

This is the output of dmesg and also the back trace of the issue.
Comment 4 Sandeep K. Shandilya 2006-09-18 14:50:49 EDT
Created attachment 136570 [details]
backtrace and dmesg output.

This is the output of dmesg and also the back trace of the issue.
Comment 5 Jarod Wilson 2006-09-25 14:00:16 EDT
I'm seeing a recursive locking message with a bonded interface as well. One of
the two NICs is tg3, the other is ns83820 (both GbE), but the bonding mode is
active-backup.
Comment 6 Jarod Wilson 2006-09-25 16:28:50 EDT
Okay, this doesn't appear to be specific to the NIC driver. I'm getting the same
thing with dual 3c59x cards.
Comment 8 Larry Troan 2006-09-28 13:51:40 EDT
This was posted against fc6. Changed to RHEL5 beta per sly. 
This is a BUG.
Comment 10 Don Zickus 2006-09-28 18:31:49 EDT
in kernel-2.6.18-1.2707.el5.bz208456
Comment 11 John W. Linville 2006-09-28 19:04:46 EDT
FC6 test kernels available here:

   http://people.redhat.com/linville/kernels/fc6/

Please try to replicate the issue with those kernels, and post the results 
here...thanks!
Comment 13 Jarod Wilson 2006-09-29 15:40:51 EDT
(In reply to comment #11)
> FC6 test kernels available here:
> 
>    http://people.redhat.com/linville/kernels/fc6/
> 
> Please try to replicate the issue with those kernels, and post the results 
> here...thanks!

I see no more lockdep spew with 2.6.18-1.2708.2.1.fc6.jwltest.9.i686.
Comment 14 Amit Bhutani 2006-10-02 20:02:25 EDT
Sandeep- Please try and regress with the people page kernel (see comment #11) 
and report results  here so we can call this issue RIP!
Comment 17 RHEL Product and Program Management 2006-10-09 14:46:48 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux release.  Product Management has requested further review
of this request by Red Hat Engineering.  This request is not yet committed for
inclusion in release.
Comment 18 Jay Turner 2006-10-09 16:34:24 EDT
QE ack for RHEL5B2.
Comment 19 Sandeep K. Shandilya 2006-10-10 01:06:04 EDT
(In reply to comment #11)
> FC6 test kernels available here:
> 
>    http://people.redhat.com/linville/kernels/fc6/
> 
> Please try to replicate the issue with those kernels, and post the results 
> here...thanks!
I did a test and found that this issue does not occur on kernel-2.6.18-1.2702
weekly RHEL 5 build 18th sept.
Comment 20 Amit Bhutani 2006-11-03 01:19:23 EST
Fix verified. Please close. Thanks!

Note You need to log in before you can comment on or make changes to this bug.