Bug 501880
Summary: | soft lockup during boot mptbase on LSI SAS1064 PCI-X Fusion | ||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Igor Smolyar <igor> | ||||||||||||||||||
Component: | kernel | Assignee: | David Milburn <dmilburn> | ||||||||||||||||||
Status: | CLOSED NOTABUG | QA Contact: | Red Hat Kernel QE team <kernel-qe> | ||||||||||||||||||
Severity: | high | Docs Contact: | |||||||||||||||||||
Priority: | low | ||||||||||||||||||||
Version: | 5.3 | CC: | jwest | ||||||||||||||||||
Target Milestone: | rc | ||||||||||||||||||||
Target Release: | --- | ||||||||||||||||||||
Hardware: | x86_64 | ||||||||||||||||||||
OS: | Linux | ||||||||||||||||||||
Whiteboard: | |||||||||||||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||||||||
Clone Of: | Environment: | ||||||||||||||||||||
Last Closed: | 2012-01-17 17:37:59 UTC | Type: | --- | ||||||||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||||||||
Documentation: | --- | CRM: | |||||||||||||||||||
Verified Versions: | Category: | --- | |||||||||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||||||
Embargoed: | |||||||||||||||||||||
Attachments: |
|
Description
Igor Smolyar
2009-05-21 08:28:57 UTC
Created attachment 344932 [details]
lspci
Created attachment 344955 [details]
dmesg with working driver mptlinux-4.18.00.00-1
this is dmesg from server booted with mptlinux-4.18.00.00-1 from LSI Logic site for RHEL5. Driver is working correct on i386 and x86_64
Can confirm that latest driver for RHEL5 from LSI Logic mptlinux-4.18.00.00-1 booted correct on i386 and x86_64. See dmesg attached Would please boot the kernel-2.6.18-146.el5 test kernel? http://people.redhat.com/dmilburn/ This would have been before the MPT driver was updated, but after an interrupt storm fix for ESB2. This fix was related to the ata_piix driver which I see you are loading, but no devices are present. It would be helpful to know how if this earlier kernel boots on your system. Thanks. Booting kernel-2.6.18-146.el5 from http://people.redhat.com/dmilburn/ - same result. Soft lockup in mptbase initialization. See dmesg attached. Created attachment 345999 [details]
boot messages of 2.6.18-146.el5
This is boot messages from 2.6.18-146.el5 kernel. Soft lockup in mptbase initialization.
I CAN boot 2.6.18-146.el5 with pci=noacpi. mptbase initialized and loaded with no soft lockups. See boot messages attached. Created attachment 346000 [details]
boot messages of 2.6.18-146.el5 with pci=noacpi kernel parameter
boot messages of 2.6.18-146.el5 with pci=noacpi kernel parameter. mptbase initialized and loaded.
Would you please see if the problem is reproducible on kernel-2.6.18-151.el5.bz501880.1? http://people.redhat.com/dmilburn/ Also, would you please attach the output of "lspci -xxvvv"? Created attachment 346193 [details]
lspci -nvvvv output
Created attachment 346194 [details] boot messages of 2.6.18-151.el5 kernel-2.6.18-151.el5.bz501880.1.i686.rpm failed to boot with same error: ioc0: LSISAS1064 A4: Capabilities={Initiator} mptbase: ioc0: Initiating recovery BUG: soft lockup - CPU#3 stuck for 10s! [swapper:0] Pid: 0, comm: swapper EIP: 0060:[<c04ed59f>] CPU: 3 EIP is at delay_tsc+0xb/0x13 EFLAGS: 00000287 Not tainted (2.6.18-151.el5.bz501880.1 #1) EAX: c03ab01a EBX: 00238298 ECX: c019a3c8 EDX: 00000018 ESI: c337a000 EDI: 00001e6c EBP: 00000000 DS: 007b ES: 007b CR0: 8005003b CR2: 093cdc8b CR3: 03323000 CR4: 000006d0 [<c04ed5d0>] __delay+0x6/0x7 [<f88cb683>] WaitForDoorbellInt+0x58/0x95 [mptbase] [<f88cb8f3>] mpt_handshake_req_reply_wait+0x1af/0x3d0 [mptbase] [<f88cc3fa>] SendIocInit+0x2c5/0x3b1 [mptbase] [<f88d0045>] mpt_do_ioc_recovery+0x1021/0x1093 [mptbase] [<c041db01>] enqueue_task+0x29/0x39 [<c041db5b>] __activate_task+0x4a/0x59 [<c041e41e>] try_to_wake_up+0x3e8/0x3f2 [<c041db5b>] __activate_task+0x4a/0x59 [<c041db01>] enqueue_task+0x29/0x39 [<c041db5b>] __activate_task+0x4a/0x59 [<c041c9e2>] __wake_up_common+0x2f/0x53 [<f88bd63c>] mptsas_ioc_reset+0xd/0x69 [mptsas] [<f88d01ad>] mpt_HardResetHandler+0xf6/0x173 [mptbase] [<f88d022a>] mpt_timer_expired+0x0/0x4e [mptbase] [<f88d024c>] mpt_timer_expired+0x22/0x4e [mptbase] [<c042c3b4>] run_timer_softirq+0xfb/0x151 [<c0428ee3>] __do_softirq+0x87/0x114 [<c04073eb>] do_softirq+0x52/0x9c [<c04059d7>] apic_timer_interrupt+0x1f/0x24 [<c041007b>] set_mtrr_prepare_save+0x13/0x6d [<c0403ce7>] mwait_idle+0x25/0x38 [<c0403ca8>] cpu_idle+0x9f/0xb9 ======================= I have a SAS1068E system handy, no problems seen on it though # uname -a Linux 2.6.18-151.el5.bz501880.1 #1 SMP Mon Jun 1 12:09:23 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux 07:00.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1068E PCI-Express Fusion-MPT SAS (rev 08) One thing I noticed from your dmesg output for the MPTLINUX-4.18.00.00-1, it looks like msi is enabled by default. mptbase: ioc0: PCI-MSI enabled The RHEL5 and upstream drivers have this turned off by default, would you do one more test please? Add this to your /etc/modprobe.conf options mptbase mpt_msi_enable_sas=1 Rebuild your initrd (any of the kernels that are failing should be fine), here is a kb article just in case you need it. http://kbase.redhat.com/faq/docs/DOC-1959 Boot the previously failing kernel without the "pci=noapic" kernel parameter. Created attachment 346359 [details] boot messages of 2.6.18-151.el5 with mpt_msi_enabled I can boot 2.6.18-151.el5.bz501880.1 with mpt_msi_enabled option. See boot messages attached. SCSI subsystem initialized Fusion MPT base driver 3.04.07rh Copyright (c) 1999-2008 LSI Corporation Fusion MPT SAS Host driver 3.04.07rh mptbase: ioc0: Initiating bringup ioc0: LSISAS1064 A4: Capabilities={Initiator} mptbase: ioc0: PCI-MSI enabled scsi0 : ioc0: LSISAS1064 A4, FwRev=01120000h, Ports=1, MaxQ=511, IRQ=106 Vendor: ATA Model: WDC WD1601ABYS-0 Rev: 6H05 Type: Direct-Access ANSI SCSI revision: 05 Vendor: ATA Model: WDC WD1601ABYS-0 Rev: 6H05 Type: Direct-Access ANSI SCSI revision: 05 Vendor: LSILOGIC Model: Logical Volume Rev: 3000 Type: Direct-Access ANSI SCSI revision: 02 SCSI device sda: 320311296 512-byte hdwr sectors (163999 MB) sda: Write Protect is off sda: Mode Sense: 03 00 00 08 SCSI device sda: drive cache: write through SCSI device sda: 320311296 512-byte hdwr sectors (163999 MB) sda: Write Protect is off sda: Mode Sense: 03 00 00 08 SCSI device sda: drive cache: write through sda: sda1 sda2 sda3 sda4 < sda5 sda6 sda7 > sd 0:1:0:0: Attached scsi disk sda 1. mpt base 3.04.07rh initialized properly if I use pci=noacpi kernel boot parameters. 2. MPT base driver 4.18.00.00, downloaded from LSI Logic site, use MSI enabled by default. Igor, This bug has been open a while and from the last updates to this, it seems we can close this. Based on the age of this BZ I'm going to go ahead and close it. If you disagree and have additional information you'd like to share, please feel free to reopen this. Thanks Jeremy West Red Hat Support Supervisor |