Bug 198541 - Kdump kernel panics while megaraid driver initialization
Kdump kernel panics while megaraid driver initialization
Status: CLOSED INSUFFICIENT_DATA
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
rawhide
i386 Linux
medium Severity medium
: ---
: ---
Assigned To: Neil Horman
Brian Brock
bzcl34nup
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2006-07-11 14:07 EDT by IBM Bug Proxy
Modified: 2008-08-02 19:40 EDT (History)
9 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-05-06 20:40:06 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Fix to resolve megaraid driver initialization issues during kdump (93.11 KB, application/x-gzip)
2006-11-28 10:45 EST, Vivek Goyal
no flags Details
Fix to resolve megaraid driver initialization issues during kdump (93.11 KB, application/x-gzip)
2007-08-29 10:42 EDT, IBM Bug Proxy
no flags Details

  None (edit)
Description IBM Bug Proxy 2006-07-11 14:07:23 EDT
LTC Owner is: vivegoya@in.ibm.com
LTC Originator is: vivegoya@in.ibm.com


Problem description:

Kdump kernel runs into panic during initialization of megaraid device driver if
some disk activity was going on at the time of crash.

Hardware Environment

Any machine with megaraid controller should be able reproduce the problem.
Orginally NEC folks had reported following environment.

  Kernel  : Linux 2.6.15.1
  kdump   : kexec-tools-1.101 + kexec-tools-1.101-kdump7.patch
  Arch    : i386/x86_64
  Package : RHEL4AS-U2GA
  Memory  : 5GB
  SCSI    : MegaRAID



Is this reproducible? Yes
    If so, how long does it (did it) take to reproduce it? 30 min
    Describe the steps:
- Build first kernel as UP kernel (Not mandatory. But gives more changes of 
  problem reproduction)
- Ensure some disk activity is going on at the time of crash. like cp operation.
- load kdump kernel and crash the system.


Did the system produce an OOPS message on the console?
    If so, copy it here:

    Loading megaraidmegaraid cmm: 2.20.2.6 (Release Date: Mon Mar 7 00:01:03 EST
2005)
  _mm.ko module
  Loading megaraidmegaraid: 2.20.4.6 (Release Date: Mon Mar 07 12:27:22 EST
2005)
  megaraid: probe new device 0x1000:0x1960:0x1000:0x0520:
                                                        bus 2:slot 2:func 0
  ACPI: PCI Interrupt 0000:02:02.0[A] -> GSI 16 (level, low) -> IRQ 193
  BUG: spinlock bad magic on CPU#0, insmod/340
   lock: ffff8100017d8028, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0
                                                                                
  (omitted)
                                                                                
  Kernel panic - not syncing: Aiee, killing interrupt handler!

Some more info:
---------------
I can reproduce this issue on my LSI megaraid hardware. The problem happens
because the device starts sending the interrupts before device initialization is
complete. In this case interrupt handler is trying to grab a spin lock which has
not been initialzed yet.

Even if in second kernel spinlock debugging is not enabled, initialization fails
somewhere else, again because interrupt handler tries to access some of the data
structures which are no more valid in the context of new kernel.

I think ideal solution would be to reset the device from software before we
register for irq. This reset operation should de-assert the interrupt line and
flush the scsi command message queue on the controller. 

But I can not find any such operation available when i go through the megaraid
driver. SCSI reset just flushes the messages which are still with the driver and
waits for the completion of messages which are already issued to the
controller/firmware.

I am unable to find any detailed technical documentation about the card which
can tell me if firmware provides any facility to soft reset the device. Any
suggestions on this issue will be helpful.
Comment 2 Vivek Goyal 2006-10-03 11:29:32 EDT
I had sent a mail to Seokman,Ju in LSI regarding this issue. He said that he
will take it up with firmware guys. There is no response back yet.

Bottom line, as of today we are not aware that how can we reset the megaraid
card from software to avoid initialization problmes.
Comment 3 Martin Wilck 2006-11-06 06:59:09 EST
Any indication whether this problem is fixed in megaraid 2.20.5?
Comment 4 Larry Troan 2006-11-20 13:22:08 EST
For RHEL, bug 208451 is open and appears to be the same problem. 
It's an rc blocker.
Comment 5 IBM Bug Proxy 2006-11-21 13:15:45 EST
----- Additional Comments From salina@us.ibm.com  2006-11-21 13:13 EDT -------
It seems like IBM is not allowed to view RH 208451
we get  
  You are not authorized to access bug #208451

Is it possible to allow bugproxy@us.ibm.com to access that bug ? 

Thanks
Salina Chu
LTC screen team 
Comment 6 Vivek Goyal 2006-11-28 10:44:02 EST
Sumant Patro sent a patch to fix megaraid related problems in kdump. This patch
is attached in bug 211630. Attaching the patch in this bug too.
Comment 7 Vivek Goyal 2006-11-28 10:45:27 EST
Created attachment 142294 [details]
Fix to resolve megaraid driver initialization issues during kdump
Comment 8 Vivek Goyal 2006-11-28 14:36:28 EST
This patch is not in the rhel kernels yet. So its wrong to set modified. I am
changing the state. Will set to modified once patch is in rhel kernels.
Comment 9 IBM Bug Proxy 2006-12-05 05:16:00 EST
changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|FIXEDAWAITINGTEST           |TESTED




------- Additional Comments From smaneesh@in.ibm.com (prefers email at maneesh@in.ibm.com)  2006-12-05 05:13 EDT -------
As per Vivek's update (over phone) the patch is tested successfully at Redhat.
Putting the bug as Tested and Submitted. 
Comment 10 Red Hat Bugzilla 2007-05-03 00:51:12 EDT
User vgoyal@redhat.com's account has been closed
Comment 11 Neil Horman 2007-08-10 07:24:12 EDT
The patch in comment #7 isn't actually a patch and is rather a completely new
version of the driver, which I don't feel comfortable doing a wholesale upgrade
to.  This patch:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=cd96d96f20f2509dfeb302548132e30f471c071a
Is the upstream patch that I think fixes this problem.  If you can confirm that
it fixes your issue, I'll backport and post ASAP.  Thanks!
Comment 12 Neil Horman 2007-08-29 10:38:18 EDT
ping.  Any update here?
Comment 13 IBM Bug Proxy 2007-08-29 10:42:25 EDT
Created attachment 179141 [details]
Fix to resolve megaraid driver initialization issues during kdump
Comment 14 Neil Horman 2007-12-10 11:17:18 EST
Thats not a fix, thats a whole version of the driver, which is exactly what you
provided in comment #7.  If you want this to make 5.2, you'll need to identify a
specific problem and fix in this driver.
Comment 15 Bug Zapper 2008-04-03 13:46:33 EDT
Based on the date this bug was created, it appears to have been reported
against rawhide during the development of a Fedora release that is no
longer maintained. In order to refocus our efforts as a project we are
flagging all of the open bugs for releases which are no longer
maintained. If this bug remains in NEEDINFO thirty (30) days from now,
we will automatically close it.

If you can reproduce this bug in a maintained Fedora version (7, 8, or
rawhide), please change this bug to the respective version and change
the status to ASSIGNED. (If you're unable to change the bug's version
or status, add a comment to the bug and someone will change it for you.)

Thanks for your help, and we apologize again that we haven't handled
these issues to this point.

The process we're following is outlined here:
http://fedoraproject.org/wiki/BugZappers/F9CleanUp

We will be following the process here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping to ensure this
doesn't happen again.
Comment 16 Bug Zapper 2008-05-06 20:40:03 EDT
This bug has been in NEEDINFO for more than 30 days since feedback was
first requested. As a result we are closing it.

If you can reproduce this bug in the future against a maintained Fedora
version please feel free to reopen it against that version.

The process we're following is outlined here:
http://fedoraproject.org/wiki/BugZappers/F9CleanUp

Note You need to log in before you can comment on or make changes to this bug.