LTC Owner is: vivegoya.com LTC Originator is: vivegoya.com Problem description: Kdump kernel runs into panic during initialization of megaraid device driver if some disk activity was going on at the time of crash. Hardware Environment Any machine with megaraid controller should be able reproduce the problem. Orginally NEC folks had reported following environment. Kernel : Linux 2.6.15.1 kdump : kexec-tools-1.101 + kexec-tools-1.101-kdump7.patch Arch : i386/x86_64 Package : RHEL4AS-U2GA Memory : 5GB SCSI : MegaRAID Is this reproducible? Yes If so, how long does it (did it) take to reproduce it? 30 min Describe the steps: - Build first kernel as UP kernel (Not mandatory. But gives more changes of problem reproduction) - Ensure some disk activity is going on at the time of crash. like cp operation. - load kdump kernel and crash the system. Did the system produce an OOPS message on the console? If so, copy it here: Loading megaraidmegaraid cmm: 2.20.2.6 (Release Date: Mon Mar 7 00:01:03 EST 2005) _mm.ko module Loading megaraidmegaraid: 2.20.4.6 (Release Date: Mon Mar 07 12:27:22 EST 2005) megaraid: probe new device 0x1000:0x1960:0x1000:0x0520: bus 2:slot 2:func 0 ACPI: PCI Interrupt 0000:02:02.0[A] -> GSI 16 (level, low) -> IRQ 193 BUG: spinlock bad magic on CPU#0, insmod/340 lock: ffff8100017d8028, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0 (omitted) Kernel panic - not syncing: Aiee, killing interrupt handler! Some more info: --------------- I can reproduce this issue on my LSI megaraid hardware. The problem happens because the device starts sending the interrupts before device initialization is complete. In this case interrupt handler is trying to grab a spin lock which has not been initialzed yet. Even if in second kernel spinlock debugging is not enabled, initialization fails somewhere else, again because interrupt handler tries to access some of the data structures which are no more valid in the context of new kernel. I think ideal solution would be to reset the device from software before we register for irq. This reset operation should de-assert the interrupt line and flush the scsi command message queue on the controller. But I can not find any such operation available when i go through the megaraid driver. SCSI reset just flushes the messages which are still with the driver and waits for the completion of messages which are already issued to the controller/firmware. I am unable to find any detailed technical documentation about the card which can tell me if firmware provides any facility to soft reset the device. Any suggestions on this issue will be helpful.
I had sent a mail to Seokman,Ju in LSI regarding this issue. He said that he will take it up with firmware guys. There is no response back yet. Bottom line, as of today we are not aware that how can we reset the megaraid card from software to avoid initialization problmes.
Any indication whether this problem is fixed in megaraid 2.20.5?
For RHEL, bug 208451 is open and appears to be the same problem. It's an rc blocker.
----- Additional Comments From salina.com 2006-11-21 13:13 EDT ------- It seems like IBM is not allowed to view RH 208451 we get You are not authorized to access bug #208451 Is it possible to allow bugproxy.com to access that bug ? Thanks Salina Chu LTC screen team
Sumant Patro sent a patch to fix megaraid related problems in kdump. This patch is attached in bug 211630. Attaching the patch in this bug too.
Created attachment 142294 [details] Fix to resolve megaraid driver initialization issues during kdump
This patch is not in the rhel kernels yet. So its wrong to set modified. I am changing the state. Will set to modified once patch is in rhel kernels.
changed: What |Removed |Added ---------------------------------------------------------------------------- Status|FIXEDAWAITINGTEST |TESTED ------- Additional Comments From smaneesh.com (prefers email at maneesh.com) 2006-12-05 05:13 EDT ------- As per Vivek's update (over phone) the patch is tested successfully at Redhat. Putting the bug as Tested and Submitted.
User vgoyal's account has been closed
The patch in comment #7 isn't actually a patch and is rather a completely new version of the driver, which I don't feel comfortable doing a wholesale upgrade to. This patch: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=cd96d96f20f2509dfeb302548132e30f471c071a Is the upstream patch that I think fixes this problem. If you can confirm that it fixes your issue, I'll backport and post ASAP. Thanks!
ping. Any update here?
Created attachment 179141 [details] Fix to resolve megaraid driver initialization issues during kdump
Thats not a fix, thats a whole version of the driver, which is exactly what you provided in comment #7. If you want this to make 5.2, you'll need to identify a specific problem and fix in this driver.
Based on the date this bug was created, it appears to have been reported against rawhide during the development of a Fedora release that is no longer maintained. In order to refocus our efforts as a project we are flagging all of the open bugs for releases which are no longer maintained. If this bug remains in NEEDINFO thirty (30) days from now, we will automatically close it. If you can reproduce this bug in a maintained Fedora version (7, 8, or rawhide), please change this bug to the respective version and change the status to ASSIGNED. (If you're unable to change the bug's version or status, add a comment to the bug and someone will change it for you.) Thanks for your help, and we apologize again that we haven't handled these issues to this point. The process we're following is outlined here: http://fedoraproject.org/wiki/BugZappers/F9CleanUp We will be following the process here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping to ensure this doesn't happen again.
This bug has been in NEEDINFO for more than 30 days since feedback was first requested. As a result we are closing it. If you can reproduce this bug in the future against a maintained Fedora version please feel free to reopen it against that version. The process we're following is outlined here: http://fedoraproject.org/wiki/BugZappers/F9CleanUp