Bug 250266

Summary: megaraid SAS driver on 2.6.21-31.el5rt hangs on boot
Product: Red Hat Enterprise MRG Reporter: Clark Williams <williams>
Component: realtime-kernelAssignee: Arnaldo Carvalho de Melo <acme>
Status: CLOSED NOTABUG QA Contact:
Severity: high Docs Contact:
Priority: medium    
Version: 1.0CC: acme, bo.yang, gozen, sumant.patro
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-03-08 15:00:23 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Boot log of Dell 1950 with megasas boot drive
none
megaraid_sas IRQF_NODELAY temporary fix
none
Boot log of x86_64 kernel-rt-2.6.21-39.el5rt failure on Dell 1950
none
Patch to allow selecting IRQF_NODELAY over IRQF_SHARED for interrupt none

Description Clark Williams 2007-07-31 15:06:03 UTC
Description of problem:
The megaraid SAS driver loses a command completion and hangs the boot process.
This same system (DELL 1950, two dual-core Xeon's) runs RHEL5 GA just fine.

Version-Release number of selected component (if applicable):
RHEL-RT 2.6.21-31.el5rt

How reproducible:

often

Steps to Reproduce:
1. boot RHEL-RT
2. Normal boot message proceed until megasas reset message
3. System hangs while periodic mesasas debug message print on the console 
  
Actual results:
See attached boot log

Expected results:
Normal boot

Additional info:

Comment 1 Clark Williams 2007-07-31 15:06:03 UTC
Created attachment 160330 [details]
Boot log of Dell 1950 with megasas boot drive

Comment 2 Gurhan Ozen 2007-08-14 21:43:18 UTC
Just to chime in here.. I have hit this issue with 2.6.21-31.el5rt and acme
pointed me to kernel-rt-2.6.21-35mega.x86_64.rpm kernel which has been working
fine for me. 


Comment 3 Gurhan Ozen 2007-08-16 15:45:38 UTC
Just to be double sure.. I ran some disktest testsuite to stress the drive. And
it 's holding up well:

# iostat
Linux 2.6.21-35mega (dell-pe1950-03.rhts.boston.redhat.com)     08/16/2007

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.68    0.00    2.29   33.75    0.00   62.28

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda             382.24     21081.73     23664.81 1789252396 2008483932
dm-0           3079.61     21081.69     23664.84 1789248618 2008486184
dm-1              0.00         0.01         0.00        920        280


Comment 4 Arnaldo Carvalho de Melo 2007-08-18 00:10:08 UTC
Created attachment 161776 [details]
megaraid_sas IRQF_NODELAY temporary fix

Comment 5 Gurhan Ozen 2007-09-04 20:12:53 UTC
Humm, so i am trying out the -37 kernel but the megasas driver is still
problemsome there:

# uname -a
Linux dell-pe1950-02.rhts.boston.redhat.com 2.6.21-37.el5rt #1 SMP PREEMPT RT
Thu Aug 30 16:05:41 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux

I was running some openmpi programs over two identical nodes both with megasas
cards and either one of the nodes would crash at some point. Here is a typical
error message before the box is frozen:

 sd 0:2:0:0: megasas: RESET -7567 cmd=2a
megasas: [ 0]waiting for 2 commands to complete


You should be able to reproduce this with any network program such as iperf, but
if you have to use what i did, then install openmpi* packages from the tree and
get the mpitests source rpm from
http://people.redhat.com/dledford/Infiniband/mpitests/2.0/2.el5/src/mpitests-2.0-2.el5.src.rpm
build it and execute the following command:

mpirun -np 2 --host host1,host2 --mca btl tcp,self /usr/bin/mpitests-IMB_MPI1

It doesn't always happen the first time around, it just happens at some random
point, usually within the first few tries. 


Comment 6 Clark Williams 2007-09-24 16:17:44 UTC
I added a module parameter to the megasas driver that would allow modification
of how the IRQ is requested. Loading the module with out the parameter causes
the default IRQF_SHARED to be used, while 'nodelay=1' will cause IRQF_NODELAY to
be used. This was built into the -39 kernel.

Unfortunately, the IRQF_NODELAY patch doesn't seem to fix it now. Booting the
-39 kernel on my Dell 1950 without the megasas.nodelay=1 fails as before (hangs
during boot with the "waiting for X commands to complete" message). Booting -39
with the megasas.nodelay=1 allows the system to boot, but it then fails after
some period of just sitting idle. It looks like the megaraid SAS adapter reset
and didn't actually come back up after the reset. I'll attach a console log of
the boot messages and failure messages.

Comment 7 Clark Williams 2007-09-24 16:19:06 UTC
Created attachment 204301 [details]
Boot log of x86_64 kernel-rt-2.6.21-39.el5rt failure on Dell 1950

Comment 8 Arnaldo Carvalho de Melo 2007-10-04 13:47:58 UTC
http://lkml.org/lkml/2007/10/3/283 looks promising, building a kernel to check
that...

Comment 9 Clark Williams 2007-10-09 18:08:51 UTC
I just backported the latest megasas code to the 2.6.21-based RT kernel (driver
version: v00.00.03.16-rc1) and am still seeing the failure. I'm going to build
and test a vanilla version of this kernel to ensure that this is -rt specific,
then move on to using logdev to match up scsi cmds and completions.

Comment 10 Clark Williams 2007-10-10 14:55:34 UTC
I built kernels with the latest megasas driver and tested with the following
configurations:

   PREEMPT_NONE:     runs
   PREEMPT_DESKTOP:  runs
   +PREEMPT_SOFTIRQ: runs
   +PREEMPT_HARDIRQ: fails

So, the point that seems to irritate the megasas driver is running interrupts as
threads. 

I'm going to add rostedt's logdev patch and start logging commands and
completions to try and see if command completions are coming back and we're
losing them somehow, or if the completions are not coming back.

Comment 11 Clark Williams 2007-10-17 18:35:07 UTC
The failure seems to be that the adapter stops interrupting for command
completions. Once that happens, the SCSI middle-layer times out and issues a bus
reset, which calls megasas_generic_reset(). The driver then reclaims all the
commands that have been issued and completed (which is all of the outstanding
commands). At this point the driver says that it's been successfully reset and
returns success to the SCSI layer. Unfortunately, at this point the adapter
never seems to generate another interrupt, even though commands issued to it
seem to complete successfully. 

All my attempts at kicking the firmware to restart interrupt generation have
failed. From inside the reset routine I've tried:

1. Multiple clearing of the interrupt: instance->instancet->clear_intr()
2. Disabling and Enabling interrupts
3. Calling megasas_issue_init_mfi()

I also modified the clear_intr() routines to loop on writing back the interrupt
ack until the status shows that the interrupt has been ack'ed. 

No change in behavior.



Comment 12 Arnaldo Carvalho de Melo 2007-10-17 18:59:23 UTC
If the nodebug=1 parameter is actually passed to the driver with the
IRQF_NODELAY hack the problem goes away. There was some false alarms as this
hack not being effective, but that was due to the parameter not being really
passed and the default, that is to have the interrupt handler routine running as
a kernel thread, was being used, which triggers the problem. One way to be
really sure that the parameter is being passed is to add this entry to
/etc/modprobe.conf:

parameter scsi_hostadapter nodelay=1

and run mkinitrd again, using '-f' to overwrite the previous initrd file. This
assumes that megaraid_sas is the first scsi host adapter on the machine.

While this doesn't fixes the problem at least the data point that registering
the megaraid_sas irq handler with IRQF_NODELAY (making it not run as a kernel
thread) allows the driver to run without problems at least on machines where
megaraid is the only scsi adapter present.

Comment 13 Clark Williams 2007-10-17 19:34:40 UTC
Created attachment 230321 [details]
Patch to allow selecting IRQF_NODELAY over IRQF_SHARED for interrupt

Here is the patch acme mentioned that allow selecting the IRQ behavior. If the
module parameter nodelay is passed in as "nodelay=1" then the interrupt is
registered with the IRQF_NODELAY attribute (rather than IRQF_SHARED). On the RT
kernel this parameter causes the interrupt to to be ack'ed in the ISR, rather
than later when the dedicated IRQ thread is scheduled.

Comment 14 Clark Williams 2007-10-17 19:38:07 UTC
Note that after conversations with jburke, acme and lclaudio, my comments in #6
and #7 are incorrect, in that the initrd didn't contain the appropriate
modprobe.conf modifications. It still looks like IRQF_NODELAY is a valid
work-around to this problem.

Comment 15 Clark Williams 2007-10-18 20:21:35 UTC
I just tried the initial build of a 2.6.23-based RT kernel on my 1950 and the
IRQF_NODELAY option did *not* keep the adapter from deciding to stop interrupting. 

So, I think IRQF_NODELAY works for 2.6.21, but fails for 2.6.23. 


Comment 16 Sumant Patro 2007-10-23 21:09:59 UTC
In the test setup, has anyone tried with a different vendor's SCSI/SAS adapter ?

Could you please set the controller to "factory defaults" and see if you still
get the issue?

Is there any FW update available for the controller you are using?

Is it possible that the controller generates interrupt but it never gets
propagated to the driver?

Thanks,

Sumant



Comment 17 Clark Williams 2007-10-23 21:24:27 UTC
Sumant,

I haven't tried a different adapter (it's the default adapter shipped with the
Dell 1950). I'm not sure what replacing that entails and I don't have anything
handy to replace it with. I know we've seen this issue on at least two other
boxes than mine though.

I will reset the controller to factory defaults. 

I did update the firmware to the latest on the Dell website. When I originally
reported this bug, the f/w was at version 5.0.1-003 and I have upgraded it to
5.1.1 (for a PERC 5i). No change in behavior.

I was just getting around to trying to poke at the IOAPIC to see if there's some
problem propagating an interrupt from the adapter. The soft state for the IOAPIC
indicates that the interrupt is /not/ masked, but I haven't confirmed that soft
state matches the actual h/w. 

Clark





Comment 18 Clark Williams 2007-10-23 21:53:55 UTC
Sumant,

I reset the adapter to factory defaults and reran the tests (basically booting
into the -rt kernel, then running the LTP racer.sh script and possibly a 'dd
if=/dev/sda of=/dev/null'), with no change. It takes between 5-15 minutes for
the failure to manifest.

Clark


Comment 19 Arnaldo Carvalho de Melo 2007-12-09 21:51:36 UTC
Current workaround: boot with "noapic", survided many tests on all the 
machines used by me, clark and jburke.

Comment 20 Luis Claudio R. Goncalves 2008-01-02 16:23:50 UTC
I have observed the very same behavior described in this bug report with a
different kernel module, megaraid_mbox. The error messages appeared for both
2.6.21-57.el5rt and -61.el5rt.

Message from loading megaraid_mbox:

Jan  2 07:59:41 dell-pe1850-01 kernel: megaraid cmm: 2.20.2.7 (Release Date: Sun
Jul 16 00:01:03 EST 2006)
Jan  2 07:59:41 dell-pe1850-01 kernel: SCSI subsystem initialized
Jan  2 07:59:41 dell-pe1850-01 kernel: megaraid: 2.20.5.1 (Release Date: Thu Nov
16 15:32:35 EST 2006)
Jan  2 07:59:41 dell-pe1850-01 kernel: megaraid: probe new device
0x1028:0x0013:0x1028:0x016c: bus 2:slot 14:func 0
Jan  2 07:59:41 dell-pe1850-01 kernel: ACPI: PCI Interrupt 0000:02:0e.0[A] ->
GSI 46 (level, low) -> IRQ 46
Jan  2 07:59:41 dell-pe1850-01 kernel: megaraid: fw version:[513O] bios
version:[H418]
Jan  2 07:59:41 dell-pe1850-01 kernel: scsi0 : LSI Logic MegaRAID driver

Error messages:

  megaraid: 1 outstanding commands. Max wait 300 sec
  megaraid mbox: Wait for 0 commands to complete:300
  megaraid mbox: reset sequence completed sucessfully
  megaraid: aborting-12123 cmd=28 <c=1 t=0 l=0>
  megaraid abort: 12123:50[255:128], fw owner
  megaraid: 1 outstanding commands. Max wait 300 sec
  megaraid mbox: Wait for 0 commands to complete:300
  megaraid mbox: reset sequence completed sucessfully
  megaraid: aborting-12124 cmd=28 <c=1 t=0 l=0>
  megaraid abort: 12124:50[255:128], fw owner
  megaraid: 1 outstanding commands. Max wait 300 sec
  megaraid mbox: Wait for 0 commands to complete:300
  megaraid mbox: reset sequence completed sucessfully
  end_request: I/O error, dev sda, sector 5360429


Comment 21 Clark Williams 2008-03-08 15:00:23 UTC
This has been found to be a symptom of a mis-behaving IOAPIC (doesn't like the
-rt way of servicing interrupts, which is to ack+mask, then unmask at some later
point). 

Closing this against Megaraid SAS driver and will open a tracker against
misbehaving IOAPICS