Bug 198541

Summary: Kdump kernel panics while megaraid driver initialization
Product: [Fedora] Fedora Reporter: IBM Bug Proxy <bugproxy>
Component: kernelAssignee: Neil Horman <nhorman>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: rawhideCC: dzickus, jarod, junichi.nomura, lwang, martin.wilck, riel, syeghiay, triage, wtogami
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard: bzcl34nup
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-05-07 00:40:06 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Fix to resolve megaraid driver initialization issues during kdump
none
Fix to resolve megaraid driver initialization issues during kdump none

Description IBM Bug Proxy 2006-07-11 18:07:23 UTC
LTC Owner is: vivegoya.com
LTC Originator is: vivegoya.com


Problem description:

Kdump kernel runs into panic during initialization of megaraid device driver if
some disk activity was going on at the time of crash.

Hardware Environment

Any machine with megaraid controller should be able reproduce the problem.
Orginally NEC folks had reported following environment.

  Kernel  : Linux 2.6.15.1
  kdump   : kexec-tools-1.101 + kexec-tools-1.101-kdump7.patch
  Arch    : i386/x86_64
  Package : RHEL4AS-U2GA
  Memory  : 5GB
  SCSI    : MegaRAID



Is this reproducible? Yes
    If so, how long does it (did it) take to reproduce it? 30 min
    Describe the steps:
- Build first kernel as UP kernel (Not mandatory. But gives more changes of 
  problem reproduction)
- Ensure some disk activity is going on at the time of crash. like cp operation.
- load kdump kernel and crash the system.


Did the system produce an OOPS message on the console?
    If so, copy it here:

    Loading megaraidmegaraid cmm: 2.20.2.6 (Release Date: Mon Mar 7 00:01:03 EST
2005)
  _mm.ko module
  Loading megaraidmegaraid: 2.20.4.6 (Release Date: Mon Mar 07 12:27:22 EST
2005)
  megaraid: probe new device 0x1000:0x1960:0x1000:0x0520:
                                                        bus 2:slot 2:func 0
  ACPI: PCI Interrupt 0000:02:02.0[A] -> GSI 16 (level, low) -> IRQ 193
  BUG: spinlock bad magic on CPU#0, insmod/340
   lock: ffff8100017d8028, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0
                                                                                
  (omitted)
                                                                                
  Kernel panic - not syncing: Aiee, killing interrupt handler!

Some more info:
---------------
I can reproduce this issue on my LSI megaraid hardware. The problem happens
because the device starts sending the interrupts before device initialization is
complete. In this case interrupt handler is trying to grab a spin lock which has
not been initialzed yet.

Even if in second kernel spinlock debugging is not enabled, initialization fails
somewhere else, again because interrupt handler tries to access some of the data
structures which are no more valid in the context of new kernel.

I think ideal solution would be to reset the device from software before we
register for irq. This reset operation should de-assert the interrupt line and
flush the scsi command message queue on the controller. 

But I can not find any such operation available when i go through the megaraid
driver. SCSI reset just flushes the messages which are still with the driver and
waits for the completion of messages which are already issued to the
controller/firmware.

I am unable to find any detailed technical documentation about the card which
can tell me if firmware provides any facility to soft reset the device. Any
suggestions on this issue will be helpful.

Comment 2 Vivek Goyal 2006-10-03 15:29:32 UTC
I had sent a mail to Seokman,Ju in LSI regarding this issue. He said that he
will take it up with firmware guys. There is no response back yet.

Bottom line, as of today we are not aware that how can we reset the megaraid
card from software to avoid initialization problmes.

Comment 3 Martin Wilck 2006-11-06 11:59:09 UTC
Any indication whether this problem is fixed in megaraid 2.20.5?

Comment 4 Larry Troan 2006-11-20 18:22:08 UTC
For RHEL, bug 208451 is open and appears to be the same problem. 
It's an rc blocker.

Comment 5 IBM Bug Proxy 2006-11-21 18:15:45 UTC
----- Additional Comments From salina.com  2006-11-21 13:13 EDT -------
It seems like IBM is not allowed to view RH 208451
we get  
  You are not authorized to access bug #208451

Is it possible to allow bugproxy.com to access that bug ? 

Thanks
Salina Chu
LTC screen team 

Comment 6 Vivek Goyal 2006-11-28 15:44:02 UTC
Sumant Patro sent a patch to fix megaraid related problems in kdump. This patch
is attached in bug 211630. Attaching the patch in this bug too.

Comment 7 Vivek Goyal 2006-11-28 15:45:27 UTC
Created attachment 142294 [details]
Fix to resolve megaraid driver initialization issues during kdump

Comment 8 Vivek Goyal 2006-11-28 19:36:28 UTC
This patch is not in the rhel kernels yet. So its wrong to set modified. I am
changing the state. Will set to modified once patch is in rhel kernels.


Comment 9 IBM Bug Proxy 2006-12-05 10:16:00 UTC
changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|FIXEDAWAITINGTEST           |TESTED




------- Additional Comments From smaneesh.com (prefers email at maneesh.com)  2006-12-05 05:13 EDT -------
As per Vivek's update (over phone) the patch is tested successfully at Redhat.
Putting the bug as Tested and Submitted. 

Comment 10 Red Hat Bugzilla 2007-05-03 04:51:12 UTC
User vgoyal's account has been closed

Comment 11 Neil Horman 2007-08-10 11:24:12 UTC
The patch in comment #7 isn't actually a patch and is rather a completely new
version of the driver, which I don't feel comfortable doing a wholesale upgrade
to.  This patch:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=cd96d96f20f2509dfeb302548132e30f471c071a
Is the upstream patch that I think fixes this problem.  If you can confirm that
it fixes your issue, I'll backport and post ASAP.  Thanks!

Comment 12 Neil Horman 2007-08-29 14:38:18 UTC
ping.  Any update here?

Comment 13 IBM Bug Proxy 2007-08-29 14:42:25 UTC
Created attachment 179141 [details]
Fix to resolve megaraid driver initialization issues during kdump

Comment 14 Neil Horman 2007-12-10 16:17:18 UTC
Thats not a fix, thats a whole version of the driver, which is exactly what you
provided in comment #7.  If you want this to make 5.2, you'll need to identify a
specific problem and fix in this driver.

Comment 15 Bug Zapper 2008-04-03 17:46:33 UTC
Based on the date this bug was created, it appears to have been reported
against rawhide during the development of a Fedora release that is no
longer maintained. In order to refocus our efforts as a project we are
flagging all of the open bugs for releases which are no longer
maintained. If this bug remains in NEEDINFO thirty (30) days from now,
we will automatically close it.

If you can reproduce this bug in a maintained Fedora version (7, 8, or
rawhide), please change this bug to the respective version and change
the status to ASSIGNED. (If you're unable to change the bug's version
or status, add a comment to the bug and someone will change it for you.)

Thanks for your help, and we apologize again that we haven't handled
these issues to this point.

The process we're following is outlined here:
http://fedoraproject.org/wiki/BugZappers/F9CleanUp

We will be following the process here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping to ensure this
doesn't happen again.

Comment 16 Bug Zapper 2008-05-07 00:40:03 UTC
This bug has been in NEEDINFO for more than 30 days since feedback was
first requested. As a result we are closing it.

If you can reproduce this bug in the future against a maintained Fedora
version please feel free to reopen it against that version.

The process we're following is outlined here:
http://fedoraproject.org/wiki/BugZappers/F9CleanUp