Bug 250266
Summary: | megaraid SAS driver on 2.6.21-31.el5rt hangs on boot | ||
---|---|---|---|
Product: | Red Hat Enterprise MRG | Reporter: | Clark Williams <williams> |
Component: | realtime-kernel | Assignee: | Arnaldo Carvalho de Melo <acme> |
Status: | CLOSED NOTABUG | QA Contact: | |
Severity: | high | Docs Contact: | |
Priority: | medium | ||
Version: | 1.0 | CC: | acme, bo.yang, gozen, sumant.patro |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2008-03-08 15:00:23 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Attachments: |
Description
Clark Williams
2007-07-31 15:06:03 UTC
Created attachment 160330 [details]
Boot log of Dell 1950 with megasas boot drive
Just to chime in here.. I have hit this issue with 2.6.21-31.el5rt and acme pointed me to kernel-rt-2.6.21-35mega.x86_64.rpm kernel which has been working fine for me. Just to be double sure.. I ran some disktest testsuite to stress the drive. And it 's holding up well: # iostat Linux 2.6.21-35mega (dell-pe1950-03.rhts.boston.redhat.com) 08/16/2007 avg-cpu: %user %nice %system %iowait %steal %idle 1.68 0.00 2.29 33.75 0.00 62.28 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 382.24 21081.73 23664.81 1789252396 2008483932 dm-0 3079.61 21081.69 23664.84 1789248618 2008486184 dm-1 0.00 0.01 0.00 920 280 Created attachment 161776 [details]
megaraid_sas IRQF_NODELAY temporary fix
Humm, so i am trying out the -37 kernel but the megasas driver is still problemsome there: # uname -a Linux dell-pe1950-02.rhts.boston.redhat.com 2.6.21-37.el5rt #1 SMP PREEMPT RT Thu Aug 30 16:05:41 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux I was running some openmpi programs over two identical nodes both with megasas cards and either one of the nodes would crash at some point. Here is a typical error message before the box is frozen: sd 0:2:0:0: megasas: RESET -7567 cmd=2a megasas: [ 0]waiting for 2 commands to complete You should be able to reproduce this with any network program such as iperf, but if you have to use what i did, then install openmpi* packages from the tree and get the mpitests source rpm from http://people.redhat.com/dledford/Infiniband/mpitests/2.0/2.el5/src/mpitests-2.0-2.el5.src.rpm build it and execute the following command: mpirun -np 2 --host host1,host2 --mca btl tcp,self /usr/bin/mpitests-IMB_MPI1 It doesn't always happen the first time around, it just happens at some random point, usually within the first few tries. I added a module parameter to the megasas driver that would allow modification of how the IRQ is requested. Loading the module with out the parameter causes the default IRQF_SHARED to be used, while 'nodelay=1' will cause IRQF_NODELAY to be used. This was built into the -39 kernel. Unfortunately, the IRQF_NODELAY patch doesn't seem to fix it now. Booting the -39 kernel on my Dell 1950 without the megasas.nodelay=1 fails as before (hangs during boot with the "waiting for X commands to complete" message). Booting -39 with the megasas.nodelay=1 allows the system to boot, but it then fails after some period of just sitting idle. It looks like the megaraid SAS adapter reset and didn't actually come back up after the reset. I'll attach a console log of the boot messages and failure messages. Created attachment 204301 [details]
Boot log of x86_64 kernel-rt-2.6.21-39.el5rt failure on Dell 1950
http://lkml.org/lkml/2007/10/3/283 looks promising, building a kernel to check that... I just backported the latest megasas code to the 2.6.21-based RT kernel (driver version: v00.00.03.16-rc1) and am still seeing the failure. I'm going to build and test a vanilla version of this kernel to ensure that this is -rt specific, then move on to using logdev to match up scsi cmds and completions. I built kernels with the latest megasas driver and tested with the following configurations: PREEMPT_NONE: runs PREEMPT_DESKTOP: runs +PREEMPT_SOFTIRQ: runs +PREEMPT_HARDIRQ: fails So, the point that seems to irritate the megasas driver is running interrupts as threads. I'm going to add rostedt's logdev patch and start logging commands and completions to try and see if command completions are coming back and we're losing them somehow, or if the completions are not coming back. The failure seems to be that the adapter stops interrupting for command completions. Once that happens, the SCSI middle-layer times out and issues a bus reset, which calls megasas_generic_reset(). The driver then reclaims all the commands that have been issued and completed (which is all of the outstanding commands). At this point the driver says that it's been successfully reset and returns success to the SCSI layer. Unfortunately, at this point the adapter never seems to generate another interrupt, even though commands issued to it seem to complete successfully. All my attempts at kicking the firmware to restart interrupt generation have failed. From inside the reset routine I've tried: 1. Multiple clearing of the interrupt: instance->instancet->clear_intr() 2. Disabling and Enabling interrupts 3. Calling megasas_issue_init_mfi() I also modified the clear_intr() routines to loop on writing back the interrupt ack until the status shows that the interrupt has been ack'ed. No change in behavior. If the nodebug=1 parameter is actually passed to the driver with the IRQF_NODELAY hack the problem goes away. There was some false alarms as this hack not being effective, but that was due to the parameter not being really passed and the default, that is to have the interrupt handler routine running as a kernel thread, was being used, which triggers the problem. One way to be really sure that the parameter is being passed is to add this entry to /etc/modprobe.conf: parameter scsi_hostadapter nodelay=1 and run mkinitrd again, using '-f' to overwrite the previous initrd file. This assumes that megaraid_sas is the first scsi host adapter on the machine. While this doesn't fixes the problem at least the data point that registering the megaraid_sas irq handler with IRQF_NODELAY (making it not run as a kernel thread) allows the driver to run without problems at least on machines where megaraid is the only scsi adapter present. Created attachment 230321 [details]
Patch to allow selecting IRQF_NODELAY over IRQF_SHARED for interrupt
Here is the patch acme mentioned that allow selecting the IRQ behavior. If the
module parameter nodelay is passed in as "nodelay=1" then the interrupt is
registered with the IRQF_NODELAY attribute (rather than IRQF_SHARED). On the RT
kernel this parameter causes the interrupt to to be ack'ed in the ISR, rather
than later when the dedicated IRQ thread is scheduled.
Note that after conversations with jburke, acme and lclaudio, my comments in #6 and #7 are incorrect, in that the initrd didn't contain the appropriate modprobe.conf modifications. It still looks like IRQF_NODELAY is a valid work-around to this problem. I just tried the initial build of a 2.6.23-based RT kernel on my 1950 and the IRQF_NODELAY option did *not* keep the adapter from deciding to stop interrupting. So, I think IRQF_NODELAY works for 2.6.21, but fails for 2.6.23. In the test setup, has anyone tried with a different vendor's SCSI/SAS adapter ? Could you please set the controller to "factory defaults" and see if you still get the issue? Is there any FW update available for the controller you are using? Is it possible that the controller generates interrupt but it never gets propagated to the driver? Thanks, Sumant Sumant, I haven't tried a different adapter (it's the default adapter shipped with the Dell 1950). I'm not sure what replacing that entails and I don't have anything handy to replace it with. I know we've seen this issue on at least two other boxes than mine though. I will reset the controller to factory defaults. I did update the firmware to the latest on the Dell website. When I originally reported this bug, the f/w was at version 5.0.1-003 and I have upgraded it to 5.1.1 (for a PERC 5i). No change in behavior. I was just getting around to trying to poke at the IOAPIC to see if there's some problem propagating an interrupt from the adapter. The soft state for the IOAPIC indicates that the interrupt is /not/ masked, but I haven't confirmed that soft state matches the actual h/w. Clark Sumant, I reset the adapter to factory defaults and reran the tests (basically booting into the -rt kernel, then running the LTP racer.sh script and possibly a 'dd if=/dev/sda of=/dev/null'), with no change. It takes between 5-15 minutes for the failure to manifest. Clark Current workaround: boot with "noapic", survided many tests on all the machines used by me, clark and jburke. I have observed the very same behavior described in this bug report with a different kernel module, megaraid_mbox. The error messages appeared for both 2.6.21-57.el5rt and -61.el5rt. Message from loading megaraid_mbox: Jan 2 07:59:41 dell-pe1850-01 kernel: megaraid cmm: 2.20.2.7 (Release Date: Sun Jul 16 00:01:03 EST 2006) Jan 2 07:59:41 dell-pe1850-01 kernel: SCSI subsystem initialized Jan 2 07:59:41 dell-pe1850-01 kernel: megaraid: 2.20.5.1 (Release Date: Thu Nov 16 15:32:35 EST 2006) Jan 2 07:59:41 dell-pe1850-01 kernel: megaraid: probe new device 0x1028:0x0013:0x1028:0x016c: bus 2:slot 14:func 0 Jan 2 07:59:41 dell-pe1850-01 kernel: ACPI: PCI Interrupt 0000:02:0e.0[A] -> GSI 46 (level, low) -> IRQ 46 Jan 2 07:59:41 dell-pe1850-01 kernel: megaraid: fw version:[513O] bios version:[H418] Jan 2 07:59:41 dell-pe1850-01 kernel: scsi0 : LSI Logic MegaRAID driver Error messages: megaraid: 1 outstanding commands. Max wait 300 sec megaraid mbox: Wait for 0 commands to complete:300 megaraid mbox: reset sequence completed sucessfully megaraid: aborting-12123 cmd=28 <c=1 t=0 l=0> megaraid abort: 12123:50[255:128], fw owner megaraid: 1 outstanding commands. Max wait 300 sec megaraid mbox: Wait for 0 commands to complete:300 megaraid mbox: reset sequence completed sucessfully megaraid: aborting-12124 cmd=28 <c=1 t=0 l=0> megaraid abort: 12124:50[255:128], fw owner megaraid: 1 outstanding commands. Max wait 300 sec megaraid mbox: Wait for 0 commands to complete:300 megaraid mbox: reset sequence completed sucessfully end_request: I/O error, dev sda, sector 5360429 This has been found to be a symptom of a mis-behaving IOAPIC (doesn't like the -rt way of servicing interrupts, which is to ack+mask, then unmask at some later point). Closing this against Megaraid SAS driver and will open a tracker against misbehaving IOAPICS |