Bug 509816 - cciss: spinlock deadlock causes NMI on HP systems
cciss: spinlock deadlock causes NMI on HP systems
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel (Show other bugs)
All Linux
urgent Severity high
: rc
: ---
Assigned To: Tomas Henzl
Evan McNabb
: ZStream
: 508014 (view as bug list)
Depends On:
Blocks: 509818 525725
  Show dependency treegraph
Reported: 2009-07-06 08:36 EDT by Prarit Bhargava
Modified: 2011-02-16 10:24 EST (History)
11 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 509818 (view as bug list)
Last Closed: 2011-02-16 10:24:41 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
corrects the spinlock use (1.01 KB, patch)
2009-07-07 05:56 EDT, Tomas Henzl
no flags Details | Diff

  None (edit)
Description Prarit Bhargava 2009-07-06 08:36:11 EDT
Description of problem:

While comparing boot sequences from a 5.4 boot versus a RHEL4 boot, the following panic occurred (see details below).  After looking into the problem it looks like there is a spinlock deadlock in the code.

Version-Release number of selected component (if applicable): 2.6.9-78.0.1.ELsmp (but seems to be in current kernel as well).

How reproducible: Unknown/very low reproducibility.  Saw this one time on an HP system in the lab.  The next boot everything was okay....

Steps to Reproduce:
1.  Boot kernel.
Actual results:

cciss: controller cciss0 failed, stopping.
cciss0: controller not responding.
NMI Watchdog detected LOCKUP, CPU=6, registers:
CPU 6 
Modules linked in: vxodm(U) parport_pc lp parport netconsole netdump autofs4
i2c_dev i2c_core sunrpc iptable_filter ip_tables ib_srp ib_sdp ib_ipoib md5
ipv6 rdma_ucm rdma_cm iw_cm ib_addr ib_umad ib_ucm ib_uverbs ib_cm ib_sa ib_mad
ib_core ide_dump scsi_dump diskdump zlib_deflate vfat fat vxportal(U) fdd(U)
vxfs(U) dmphpalua(U) dmpaaa(U) vxspec(U) vxio(U) vxdmp(U) button battery ac
k8_edac edac_mc netxen_nic e1000 bnx2 bonding(U) dm_snapshot dm_zero dm_mirror
ext3 jbd dm_mod qla2400 cciss qla2xxx scsi_transport_fc usb_storage uhci_hcd
ohci_hcd ehci_hcd sd_mod scsi_mod
Pid: 0, comm: swapper Tainted: PF     2.6.9-78.0.1.ELsmp
RIP: 0010:[<ffffffff80319600>] <ffffffff80319600>{.text.lock.spinlock+14}
RSP: 0018:000001007fd0bef8  EFLAGS: 00000082
RAX: 0000000000000026 RBX: 0000010871f238dc RCX: 0000000000000046
RDX: 000000000010a5e9 RSI: 0000000000000046 RDI: 0000010871f238dc
RBP: 0000000000000000 R08: 0000000000000005 R09: 0000010871f20000
R10: 000001046f73e000 R11: 0000000000000000 R12: 0000000000000062
R13: 0000010c71f9fe98 R14: 0000000000000000 R15: 0000000000000002
FS:  0000002a95576b00(0000) GS:ffffffff8050d580(0000) knlGS:00000000de9f5ba0
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000ad8984 CR3: 0000000c86174000 CR4: 00000000000006e0
Process swapper (pid: 0, threadinfo 0000010c71f9e000, task 00000104800717f0)
Stack: 0000000000000046 0000000000000002 0000010871f20000 ffffffffa00939c6 
       0000000000000062 00000104702c9a40 0000000000000001 0000000000000062 
       0000010c71f9fe98 0000010c71f9fe98 
Call Trace:<IRQ> <ffffffffa00939c6>{:cciss:do_cciss_intr+210}
       <ffffffff8011326c>{do_IRQ+197} <ffffffff801108bf>{ret_from_intr+0} 
        <EOI> <ffffffff8010e789>{default_idle+0}

Expected results:

No NMI lockup should be seen.

Additional info:  I looked into the code and this is what I've come up with.

do_cciss_intr acquires the CCISS_LOCK(h->ctlr) lock.

In an error-handling situation, as evidenced by the above output, fail_all_cmds() is called.

fail_all_cmds() attempts to *reacquire* the lock.


FWIW, the same code exists in RHEL5 (I will clone this to RHEL5).
Comment 1 Tomas Henzl 2009-07-07 05:56:29 EDT
Created attachment 350763 [details]
corrects the spinlock use

This patch removes the spinlock lock/unlock from fail_all_cmds and adds a spin_unlock after the call to fail_all_cmds before the return.
Comment 2 RHEL Product and Program Management 2009-07-07 06:03:02 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
Comment 3 Mike Miller (OS Dev) 2009-07-07 12:05:42 EDT
The fix looks good to me. Do I need to do anything for this bug?
Comment 4 Tomas Henzl 2009-07-08 10:17:54 EDT
(In reply to comment #3)
> The fix looks good to me. Do I need to do anything for this bug?  

No, and thanks for the review. For open issues with cciss look for example here - 505506, 479090, 250485.
Comment 5 Tomas Henzl 2009-07-08 11:51:56 EDT
Comment 7 Lachlan McIlroy 2009-07-08 21:40:42 EDT
*** Bug 508014 has been marked as a duplicate of this bug. ***
Comment 8 Vivek Goyal 2009-07-14 14:52:29 EDT
Committed in 89.6.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/
Comment 16 errata-xmlrpc 2011-02-16 10:24:41 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.


Note You need to log in before you can comment on or make changes to this bug.