Bug 501880

Summary: soft lockup during boot mptbase on LSI SAS1064 PCI-X Fusion
Product: Red Hat Enterprise Linux 5 Reporter: Igor Smolyar <igor>
Component: kernelAssignee: David Milburn <dmilburn>
Status: CLOSED NOTABUG QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: high Docs Contact:
Priority: low    
Version: 5.3CC: jwest
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-01-17 17:37:59 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
dmesg
none
lspci
none
dmesg with working driver mptlinux-4.18.00.00-1
none
boot messages of 2.6.18-146.el5
none
boot messages of 2.6.18-146.el5 with pci=noacpi kernel parameter
none
lspci -nvvvv output
none
boot messages of 2.6.18-151.el5
none
boot messages of 2.6.18-151.el5 with mpt_msi_enabled none

Description Igor Smolyar 2009-05-21 08:28:57 UTC
Created attachment 344931 [details]
dmesg

Description of problem:

Can not install RHEL5 up 3 on LSI Logic storage controller SAS1064 PCI-X Fusion.
Mptbase failed to initialize and stuck in soft lockup. 

Last messages on serial console:

...
Loading libscsi1 : ata_piix
ata.ko module
Lata1: SATA max UDMA/133 cmd 0xef80 ctl 0xef00 bmdma 0xed80 irq 10
ata2: SATA max UDMA/133 cmd 0xee80 ctl 0xee00 bmdma 0xed88 irq 10
oading ata_piix.ko module
Loading dm-mod.kdevice-mapper: uevent: version 1.0.3
o module
device-mapper: ioctl: 4.11.5-ioctl (2007-12-12) initialised: dm-devel
Loading mptbase.Fusion MPT base driver 3.04.07rh
ko module
Copyright (c) 1999-2008 LSI Corporation
Loading scsi_traFusion MPT SAS Host driver 3.04.07rh
nsport_sas.ko momptbase: ioc0: Initiating bringup
dule
Loading mptscsih.ko module
Loading mptsas.ko module
ioc0: LSISAS1064 A4: Capabilities={Initiator}
mptbase: ioc0: Initiating recovery
BUG: soft lockup - CPU#2 stuck for 10s! [swapper:0]
CPU 2:
Modules linked in: mptsas mptscsih scsi_transport_sas mptbase dm_mod ata_piix ld
Pid: 0, comm: swapper Not tainted 2.6.18-149.el5 #1
RIP: 0010:[<ffffffff8000d1dc>]  [<ffffffff8000d1dc>] __delay+0x8/0x10
RSP: 0000:ffff81011fc6bb38  EFLAGS: 00000287
RAX: 00000000b688fe2e RBX: ffff81011f325000 RCX: 00000000b6754d86
RDX: 000000000000007d RSI: 000000000000012c RDI: 0000000000238072
RBP: ffff81011fc6bab0 R08: ffff81011fc6bc10 R09: 000000000000012c
R10: ffff81011f3253d4 R11: 000000000000004c R12: ffffffff8005ec8e
R13: 000000000004758d R14: ffffffff800787b2 R15: ffff81011fc6bab0
FS:  0000000000000000(0000) GS:ffff81011fc1ee40(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00000000093e013f CR3: 000000011eb73000 CR4: 00000000000006e0

Call Trace:
 <IRQ>  [<ffffffff881152cc>] :mptbase:WaitForDoorbellInt+0x62/0xb0
 [<ffffffff88115589>] :mptbase:mpt_handshake_req_reply_wait+0x1d7/0x416
 [<ffffffff88116194>] :mptbase:SendIocInit+0x2f7/0x408
 [<ffffffff88117580>] :mptbase:MakeIocReady+0x14d/0x2cf
 [<ffffffff8811a995>] :mptbase:mpt_do_ioc_recovery+0x1474/0x14ef
 [<ffffffff8014f0f1>] __next_cpu+0x19/0x28
 [<ffffffff8008cc6d>] sched_clock_cpu+0x116/0x124
 [<ffffffff8014f0f1>] __next_cpu+0x19/0x28
 [<ffffffff8008b94a>] find_busiest_group+0x20d/0x621
 [<ffffffff8008c7b5>] __activate_task+0x56/0x6d
 [<ffffffff8006ed98>] do_gettimeofday+0x40/0x8f
 [<ffffffff8005b13e>] getnstimeofday+0x10/0x28
 [<ffffffff8811ab2b>] :mptbase:mpt_HardResetHandler+0x11b/0x1a7
 [<ffffffff8811abb7>] :mptbase:mpt_timer_expired+0x0/0x57
 [<ffffffff8811abdd>] :mptbase:mpt_timer_expired+0x26/0x57
 [<ffffffff80097bc7>] run_timer_softirq+0x133/0x1af
 [<ffffffff80012c24>] __do_softirq+0x89/0x133
 [<ffffffff8005f2fc>] call_softirq+0x1c/0x28
 [<ffffffff8006db30>] do_softirq+0x2c/0x85
 [<ffffffff800579cb>] mwait_idle+0x0/0x4a
 [<ffffffff8005ec8e>] apic_timer_interrupt+0x66/0x6c
 <EOI>  [<ffffffff80057a01>] mwait_idle+0x36/0x4a
 [<ffffffff80049c16>] cpu_idle+0x95/0xb8
 [<ffffffff80077ebe>] start_secondary+0x45a/0x469

mptbase: ioc0: Initiating recovery
BUG: soft lockup - CPU#2 stuck for 10s! [swapper:0]
CPU 2:
Modules linked in: mptsas mptscsih scsi_transport_sas mptbase dm_mod ata_piix ld
Pid: 0, comm: swapper Not tainted 2.6.18-149.el5 #1
RIP: 0010:[<ffffffff8000d1dc>]  [<ffffffff8000d1dc>] __delay+0x8/0x10
RSP: 0000:ffff81011fc6bb38  EFLAGS: 00000283
RAX: 000000001423c115 RBX: ffff81011f325000 RCX: 00000000141970fd
RDX: 000000000000008a RSI: 000000000000012c RDI: 0000000000238072
RBP: ffff81011fc6bab0 R08: ffff81011fc6bc10 R09: 000000000000012c
R10: ffff81011f3253d4 R11: 000000000000004c R12: ffffffff8005ec8e
R13: 000000000004758d R14: ffffffff800787b2 R15: ffff81011fc6bab0
FS:  0000000000000000(0000) GS:ffff81011fc1ee40(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00000000093e013f CR3: 000000011eb73000 CR4: 00000000000006e0

Call Trace:
 <IRQ>  [<ffffffff881152cc>] :mptbase:WaitForDoorbellInt+0x62/0xb0
 [<ffffffff88115589>] :mptbase:mpt_handshake_req_reply_wait+0x1d7/0x416
 [<ffffffff88116194>] :mptbase:SendIocInit+0x2f7/0x408
 [<ffffffff88117580>] :mptbase:MakeIocReady+0x14d/0x2cf
 [<ffffffff8811a995>] :mptbase:mpt_do_ioc_recovery+0x1474/0x14ef
 [<ffffffff8014f0f1>] __next_cpu+0x19/0x28
 [<ffffffff8006ed98>] do_gettimeofday+0x40/0x8f
 [<ffffffff8008cc6d>] sched_clock_cpu+0x116/0x124
 [<ffffffff8014f0f1>] __next_cpu+0x19/0x28
 [<ffffffff8008b94a>] find_busiest_group+0x20d/0x621
 [<ffffffff8008c7b5>] __activate_task+0x56/0x6d
 [<ffffffff8811ab2b>] :mptbase:mpt_HardResetHandler+0x11b/0x1a7
 [<ffffffff8811abb7>] :mptbase:mpt_timer_expired+0x0/0x57
 [<ffffffff8811abdd>] :mptbase:mpt_timer_expired+0x26/0x57
 [<ffffffff80097bc7>] run_timer_softirq+0x133/0x1af
 [<ffffffff80012c24>] __do_softirq+0x89/0x133
 [<ffffffff8005f2fc>] call_softirq+0x1c/0x28
 [<ffffffff8006db30>] do_softirq+0x2c/0x85
 [<ffffffff800579cb>] mwait_idle+0x0/0x4a
 [<ffffffff8005ec8e>] apic_timer_interrupt+0x66/0x6c
 <EOI>  [<ffffffff80057a01>] mwait_idle+0x36/0x4a
 [<ffffffff80049c16>] cpu_idle+0x95/0xb8
 [<ffffffff80077ebe>] start_secondary+0x45a/0x469

Version-Release number of selected component (if applicable):
Tested with RHEL5 up 3 and tested with Don Zickus kernel from http://people.redhat.com/dzickus/el5/149.el5/

We CAN boot RHEL5 up 3 i686 kernel with pci=noacpi kernel parameters, no soft lockup errors.

How reproducible:
Try to install latest RHEL5 i686 or x86_64 on system with storage controller LSI Logic SAS1064 PCI-X Fusion (LSISAS1064). Mirror RAID of two SATA disks configured.

Comment 1 Igor Smolyar 2009-05-21 08:30:17 UTC
Created attachment 344932 [details]
lspci

Comment 2 Igor Smolyar 2009-05-21 12:43:51 UTC
Created attachment 344955 [details]
dmesg with working driver mptlinux-4.18.00.00-1

this is dmesg from server booted with mptlinux-4.18.00.00-1 from LSI Logic site for RHEL5. Driver is working correct on i386 and x86_64

Comment 3 Igor Smolyar 2009-05-21 12:47:27 UTC
Can confirm that latest driver for RHEL5 from LSI Logic mptlinux-4.18.00.00-1 booted correct on i386 and x86_64. See dmesg attached

Comment 4 David Milburn 2009-05-29 14:27:40 UTC
Would please boot the kernel-2.6.18-146.el5 test kernel?

http://people.redhat.com/dmilburn/

This would have been before the MPT driver was updated, but after an interrupt
storm fix for ESB2. This fix was related to the ata_piix driver which I see you
are loading, but no devices are present. It would be helpful to know how
if this earlier kernel boots on your system. Thanks.

Comment 5 Igor Smolyar 2009-05-31 06:46:40 UTC
Booting  kernel-2.6.18-146.el5 from http://people.redhat.com/dmilburn/
 - same result. Soft lockup in mptbase initialization.
See dmesg attached.

Comment 6 Igor Smolyar 2009-05-31 06:48:17 UTC
Created attachment 345999 [details]
boot messages of 2.6.18-146.el5

This is boot messages from 2.6.18-146.el5 kernel. Soft lockup in mptbase initialization.

Comment 7 Igor Smolyar 2009-05-31 06:55:20 UTC
I CAN boot 2.6.18-146.el5 with pci=noacpi. mptbase initialized and loaded with no soft lockups. See boot messages attached.

Comment 8 Igor Smolyar 2009-05-31 06:56:28 UTC
Created attachment 346000 [details]
 boot messages of 2.6.18-146.el5 with pci=noacpi kernel parameter

boot messages of 2.6.18-146.el5 with pci=noacpi kernel parameter. mptbase initialized and loaded.

Comment 9 David Milburn 2009-06-01 20:47:12 UTC
Would you please see if the problem is reproducible on kernel-2.6.18-151.el5.bz501880.1?

http://people.redhat.com/dmilburn/

Also, would you please attach the output of "lspci -xxvvv"?

Comment 10 Igor Smolyar 2009-06-02 06:45:41 UTC
Created attachment 346193 [details]
lspci -nvvvv output

Comment 11 Igor Smolyar 2009-06-02 06:48:52 UTC
Created attachment 346194 [details]
boot messages of 2.6.18-151.el5

kernel-2.6.18-151.el5.bz501880.1.i686.rpm failed to boot with same error:

                                                                                                             
ioc0: LSISAS1064 A4: Capabilities={Initiator}                                                                                               
mptbase: ioc0: Initiating recovery                                                                                                          
BUG: soft lockup - CPU#3 stuck for 10s! [swapper:0]                                                                                         

Pid: 0, comm:              swapper
EIP: 0060:[<c04ed59f>] CPU: 3     
EIP is at delay_tsc+0xb/0x13      
 EFLAGS: 00000287    Not tainted  (2.6.18-151.el5.bz501880.1 #1)
EAX: c03ab01a EBX: 00238298 ECX: c019a3c8 EDX: 00000018         
ESI: c337a000 EDI: 00001e6c EBP: 00000000 DS: 007b ES: 007b     
CR0: 8005003b CR2: 093cdc8b CR3: 03323000 CR4: 000006d0         
 [<c04ed5d0>] __delay+0x6/0x7                                   
 [<f88cb683>] WaitForDoorbellInt+0x58/0x95 [mptbase]            
 [<f88cb8f3>] mpt_handshake_req_reply_wait+0x1af/0x3d0 [mptbase]
 [<f88cc3fa>] SendIocInit+0x2c5/0x3b1 [mptbase]                 
 [<f88d0045>] mpt_do_ioc_recovery+0x1021/0x1093 [mptbase]       
 [<c041db01>] enqueue_task+0x29/0x39                            
 [<c041db5b>] __activate_task+0x4a/0x59                         
 [<c041e41e>] try_to_wake_up+0x3e8/0x3f2                        
 [<c041db5b>] __activate_task+0x4a/0x59                         
 [<c041db01>] enqueue_task+0x29/0x39                            
 [<c041db5b>] __activate_task+0x4a/0x59                         
 [<c041c9e2>] __wake_up_common+0x2f/0x53                        
 [<f88bd63c>] mptsas_ioc_reset+0xd/0x69 [mptsas]                
 [<f88d01ad>] mpt_HardResetHandler+0xf6/0x173 [mptbase]         
 [<f88d022a>] mpt_timer_expired+0x0/0x4e [mptbase]              
 [<f88d024c>] mpt_timer_expired+0x22/0x4e [mptbase]             
 [<c042c3b4>] run_timer_softirq+0xfb/0x151                      
 [<c0428ee3>] __do_softirq+0x87/0x114                           
 [<c04073eb>] do_softirq+0x52/0x9c                              
 [<c04059d7>] apic_timer_interrupt+0x1f/0x24                    
 [<c041007b>] set_mtrr_prepare_save+0x13/0x6d                   
 [<c0403ce7>] mwait_idle+0x25/0x38                              
 [<c0403ca8>] cpu_idle+0x9f/0xb9                                
 =======================

Comment 12 David Milburn 2009-06-02 17:36:13 UTC
I have a SAS1068E system handy, no problems seen on it though

#   uname -a
Linux 2.6.18-151.el5.bz501880.1 #1 SMP Mon Jun 1 12:09:23 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux

07:00.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1068E PCI-Express Fusion-MPT SAS (rev 08)

One thing I noticed from your dmesg output for the MPTLINUX-4.18.00.00-1,
it looks like msi is enabled by default.

mptbase: ioc0: PCI-MSI enabled   

The RHEL5 and upstream drivers have this turned off by default, would you do
one more test please?

Add this to your /etc/modprobe.conf

options mptbase mpt_msi_enable_sas=1

Rebuild your initrd (any of the kernels that are failing should be fine),
here is a kb article just in case you need it.

http://kbase.redhat.com/faq/docs/DOC-1959

Boot the previously failing kernel without the "pci=noapic" kernel parameter.

Comment 13 Igor Smolyar 2009-06-03 07:04:31 UTC
Created attachment 346359 [details]
boot messages of 2.6.18-151.el5 with mpt_msi_enabled

I can boot 2.6.18-151.el5.bz501880.1 with mpt_msi_enabled option.
See boot messages attached.

SCSI subsystem initialized                                                      
Fusion MPT base driver 3.04.07rh                                                
Copyright (c) 1999-2008 LSI Corporation                                         
Fusion MPT SAS Host driver 3.04.07rh                                            
mptbase: ioc0: Initiating bringup                                               
ioc0: LSISAS1064 A4: Capabilities={Initiator}                                   
mptbase: ioc0: PCI-MSI enabled                                                  
scsi0 : ioc0: LSISAS1064 A4, FwRev=01120000h, Ports=1, MaxQ=511, IRQ=106        
  Vendor: ATA       Model: WDC WD1601ABYS-0  Rev: 6H05                          
  Type:   Direct-Access                      ANSI SCSI revision: 05             
  Vendor: ATA       Model: WDC WD1601ABYS-0  Rev: 6H05                          
  Type:   Direct-Access                      ANSI SCSI revision: 05             
  Vendor: LSILOGIC  Model: Logical Volume    Rev: 3000                          
  Type:   Direct-Access                      ANSI SCSI revision: 02             
SCSI device sda: 320311296 512-byte hdwr sectors (163999 MB)                    
sda: Write Protect is off                                                       
sda: Mode Sense: 03 00 00 08                                                    
SCSI device sda: drive cache: write through                                     
SCSI device sda: 320311296 512-byte hdwr sectors (163999 MB)                    
sda: Write Protect is off                                                       
sda: Mode Sense: 03 00 00 08                                                    
SCSI device sda: drive cache: write through                                     
 sda: sda1 sda2 sda3 sda4 < sda5 sda6 sda7 >                                    
sd 0:1:0:0: Attached scsi disk sda

Comment 14 Igor Smolyar 2009-06-03 07:17:33 UTC
1. mpt base 3.04.07rh initialized properly if I use pci=noacpi kernel boot parameters.

2. MPT base driver 4.18.00.00, downloaded from LSI Logic site, use MSI enabled by default.

Comment 17 Jeremy West 2012-01-17 17:37:59 UTC
Igor,

This bug has been open a while and from the last updates to this, it seems we can close this.  Based on the age of this BZ I'm going to go ahead and close it.  If you disagree and have additional information you'd like to share, please feel free to reopen this.

Thanks
Jeremy West
Red Hat Support Supervisor