Bug 159518 - Fusion MPT driver causes SCSI Bus Resets
Fusion MPT driver causes SCSI Bus Resets
Status: CLOSED WORKSFORME
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel (Show other bugs)
4.0
i686 Linux
medium Severity medium
: ---
: ---
Assigned To: Tom Coughlan
Brian Brock
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2005-06-03 10:14 EDT by ed2019
Modified: 2007-11-30 17:07 EST (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2005-09-19 14:36:31 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Result of setting scsi_debug, and modprobe mptbase and mptscsih (8.81 KB, text/plain)
2005-06-09 09:37 EDT, ed2019
no flags Details

  None (edit)
Description ed2019 2005-06-03 10:14:25 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.8) Gecko/20050517 Firefox/1.0.4 (Debian package 1.0.4-2)

Description of problem:
I have 1 LSI 22320-R card connected to an external RAID box, with two SCSI channels.  The RAID box presents two drives per channel, so we see sda, sdb, sdc, and sdd.

Everything starts out OK and the drives are accessible.  After a few minutes, applications accessing the drives will hang, and the error messages (attached below) will show up in the logs.


Version-Release number of selected component (if applicable):
kernel-2.6.9-5.0.5.EL

How reproducible:
Always

Steps to Reproduce:
1. Stick an LSI 22320-R raid card into an HP DL360G4 server, install RHEL4
2. Connect to two SCSI devices via the external channel connectors.
3. try reading to and writing from the connected disks.
  

Actual Results:  The above errors (bus resets, aborts, etc. ) occurred

Expected Results:  No errors, disk access works fine.

Additional info:


May 25 20:03:02 alpenwurst kernel: mptscsih: ioc0: >> Attempting task
abort! (sc=f752a080)
May 25 20:03:02 alpenwurst kernel: mptscsih: ioc0: >> Attempting task
abort! (sc=f752a080)
May 25 20:03:02 alpenwurst kernel: mptbase: ioc0: IOCStatus(0x0048):
SCSI Task Terminated
May 25 20:03:02 alpenwurst kernel: mptbase: ioc0: IOCStatus(0x0048):
SCSI Task Terminated
May 25 20:03:07 alpenwurst kernel: mptscsih: ioc1: >> Attempting task
abort! (sc=f74a9080)
May 25 20:03:07 alpenwurst kernel: mptscsih: ioc1: >> Attempting task
abort! (sc=f74a9080)
May 25 20:03:07 alpenwurst kernel: mptbase: ioc1: IOCStatus(0x0048):
SCSI Task Terminated
May 25 20:03:07 alpenwurst kernel: mptbase: ioc1: IOCStatus(0x0048):
SCSI Task Terminated
May 25 20:03:12 alpenwurst kernel: mptscsih: ioc0: >> Attempting task
abort! (sc=f752a080)
May 25 20:03:12 alpenwurst kernel: mptscsih: ioc0: >> Attempting task
abort! (sc=f752a080)
May 25 20:03:12 alpenwurst kernel: mptscsih: ioc0: >> Attempting task
abort! (sc=f752a800)
May 25 20:03:12 alpenwurst kernel: mptscsih: ioc0: >> Attempting task
abort! (sc=f752a800)
May 25 20:03:14 alpenwurst kernel: mptbase: Initiating ioc0 recovery
May 25 20:03:14 alpenwurst kernel: mptbase: Initiating ioc0 recovery
May 25 20:03:16 alpenwurst kernel: mptscsih: ioc1: >> Attempting bus
reset! (sc=f74a9080)
May 25 20:03:16 alpenwurst kernel: mptscsih: ioc1: >> Attempting bus
reset! (sc=f74a9080)
May 25 20:03:32 alpenwurst kernel: mptscsih: ioc0: >> Attempting bus
reset! (sc=f752a080)
May 25 20:03:32 alpenwurst kernel: mptscsih: ioc0: >> Attempting bus
reset! (sc=f752a080)
May 25 20:04:12 alpenwurst kernel: mptscsih: ioc0: >> Attempting host
reset! (sc=f752a080)
May 25 20:04:12 alpenwurst kernel: mptscsih: ioc0: >> Attempting host
reset! (sc=f752a080)
May 25 20:04:52 alpenwurst kernel: scsi: Device offlined - not ready
after error recovery: host 0 channel 0 id 2 lun 0
May 25 20:04:52 alpenwurst kernel: scsi: Device offlined - not ready
after error recovery: host 0 channel 0 id 2 lun 0
May 25 20:04:52 alpenwurst last message repeated 2 times
May 25 20:04:52 alpenwurst kernel: scsi0 (2:0): rejecting I/O to offline
device
May 25 20:04:52 alpenwurst last message repeated 2 times
May 25 20:04:52 alpenwurst kernel: scsi0 (2:0): rejecting I/O to offline
device
May 25 20:04:52 alpenwurst kernel: Buffer I/O error on device sdb1,
logical block 160829085
May 25 20:04:52 alpenwurst kernel: Buffer I/O error on device sdb1,
logical block 160829085
May 25 20:04:52 alpenwurst kernel: lost page write due to I/O error on sdb1
May 25 20:04:52 alpenwurst kernel: lost page write due to I/O error on sdb1
May 25 20:04:52 alpenwurst kernel: scsi0 (2:0): rejecting I/O to offline
device
May 25 20:04:52 alpenwurst kernel: scsi0 (2:0): rejecting I/O to offline
device
May 25 20:04:52 alpenwurst kernel: scsi0 (2:0): rejecting I/O to offline
device
May 25 20:04:52 alpenwurst kernel: Buffer I/O error on device sdb1,
logical block 227169354
May 25 20:04:52 alpenwurst kernel: scsi0 (2:0): rejecting I/O to offline
device
Comment 1 Tom Coughlan 2005-06-07 15:52:18 EDT
This appears to be a command timeout. Do you have some confidence in the SCSI
bus configuration? Check bus length, terminators, cables, all that.

You might try using the mpt fusion BIOS to lower the transfer speed, and disable
domain vaidation if possible, to see if that makes a difference. (I am not
suggesting this as a permanent solution, just for debugging the problem.)

Please post /var/log/messages showing the HBA being configured. 

Also rmmod the fusion driver, then

# sysctl -w dev.scsi.logging_level=0x0000003d

This turns on error logging and timeout logging in the SCSI midlayer. Then
modprobe the fusion driver and post /var/log/messages. Set logging_level back to
zero.

We are testing an update for the mpt fusion driver. If you would like to try it
I can make it available to you. 
Comment 2 ed2019 2005-06-09 08:55:20 EDT
The SCSI config was known to work under RHEL3.  We have also exchanged all the
components including servers, SCSI cards, cables, and targets, and the errors
persist.

I'd love to try the new mpt fusion driver if you think it will work.  We
received a driver update from LSI (version 3.2.19) but it has not worked.  I'll
attatch some of the new output today.

I'll try the scsi logging stuff and send you the results.
Comment 3 ed2019 2005-06-09 09:01:09 EDT
* Normal SCSI Logs from dmesg:

Fusion MPT base driver 3.01.16
Copyright (c) 1999-2004 LSI Logic Corporation
ACPI: PCI interrupt 0000:0a:01.0[A] -> GSI 72 (level, low) -> IRQ 201
mptbase: Initiating ioc0 bringup
ioc0: 53C1030: Capabilities={Initiator}
ACPI: PCI interrupt 0000:0a:01.1[B] -> GSI 73 (level, low) -> IRQ 209
mptbase: Initiating ioc1 bringup
ioc1: 53C1030: Capabilities={Initiator}
Fusion MPT SCSI Host driver 3.01.16
scsi0 : ioc0: LSI53C1030, FwRev=01030700h, Ports=1, MaxQ=222, IRQ=201
  Vendor: JetStor   Model: Volume Set # 00   Rev: R001
  Type:   Direct-Access                      ANSI SCSI revision: 03
SCSI device sda: 2343745536 512-byte hdwr sectors (1199998 MB)
SCSI device sda: drive cache: write back
 sda: sda1
Attached scsi disk sda at scsi0, channel 0, id 0, lun 0
  Vendor: JetStor   Model: Volume Set # 01   Rev: R001
  Type:   Direct-Access                      ANSI SCSI revision: 03
SCSI device sdb: 2343745536 512-byte hdwr sectors (1199998 MB)
SCSI device sdb: drive cache: write back
 sdb: sdb1
Attached scsi disk sdb at scsi0, channel 0, id 2, lun 0
scsi1 : ioc1: LSI53C1030, FwRev=01030700h, Ports=1, MaxQ=222, IRQ=209
  Vendor: JetStor   Model: Volume Set # 02   Rev: R001
  Type:   Direct-Access                      ANSI SCSI revision: 03
SCSI device sdc: 2343745536 512-byte hdwr sectors (1199998 MB)
SCSI device sdc: drive cache: write back
 sdc: sdc1
Attached scsi disk sdc at scsi1, channel 0, id 0, lun 0
  Vendor: JetStor   Model: Volume Set # 03   Rev: R001
  Type:   Direct-Access                      ANSI SCSI revision: 03
SCSI device sdd: 2343745536 512-byte hdwr sectors (1199998 MB)
SCSI device sdd: drive cache: write back
 sdd: sdd1
Attached scsi disk sdd at scsi1, channel 0, id 2, lun 0
Comment 4 ed2019 2005-06-09 09:11:42 EDT
SCSI info from messages:
Jun  9 09:04:18 alpenwurst kernel: mptbase: Initiating ioc0 bringup
Jun  9 09:04:18 alpenwurst kernel: mptbase: Initiating ioc0 bringup
Jun  9 09:04:18 alpenwurst kernel: mptbase: Initiating ioc1 bringup
Jun  9 09:04:18 alpenwurst kernel: mptbase: Initiating ioc1 bringup
Jun  9 09:04:18 alpenwurst kernel: ioc1: 53C1030: Capabilities={Initiator}
Jun  9 09:04:18 alpenwurst kernel: ioc1: 53C1030: Capabilities={Initiator}
Jun  9 09:04:18 alpenwurst kernel: Fusion MPT SCSI Host driver 3.01.16
Jun  9 09:04:18 alpenwurst kernel: Fusion MPT SCSI Host driver 3.01.16
Jun  9 09:04:19 alpenwurst kernel: scsi0 : ioc0: LSI53C1030, FwRev=01030700h,
Ports=1, MaxQ=222, IRQ=201
Jun  9 09:04:19 alpenwurst kernel: scsi0 : ioc0: LSI53C1030, FwRev=01030700h,
Ports=1, MaxQ=222, IRQ=201
Jun  9 09:04:19 alpenwurst kernel:   Vendor: JetStor   Model: Volume Set # 00
Rev: R001
Jun  9 09:04:19 alpenwurst kernel:   Vendor: JetStor   Model: Volume Set # 00
Rev: R001
Jun  9 09:04:19 alpenwurst kernel:   Type:   Direct-Access
ANSI SCSI revision: 03
Jun  9 09:04:19 alpenwurst kernel:   Type:   Direct-Access
ANSI SCSI revision: 03
Jun  9 09:04:19 alpenwurst kernel: SCSI device sda: 2343745536 512-byte hdwr
sectors (1199998 MB)
Jun  9 09:04:19 alpenwurst kernel: SCSI device sda: 2343745536 512-byte hdwr
sectors (1199998 MB)
Jun  9 09:04:19 alpenwurst kernel: SCSI device sda: drive cache: write back
Jun  9 09:04:19 alpenwurst kernel: SCSI device sda: drive cache: write back
Jun  9 09:04:19 alpenwurst kernel:  sda: sda1
Jun  9 09:04:19 alpenwurst kernel:  sda: sda1
...
Comment 5 ed2019 2005-06-09 09:37:37 EDT
Created attachment 115259 [details]
Result of setting scsi_debug, and modprobe mptbase and mptscsih

stock kernel and driver from RHEl4.
Comment 6 Tom Coughlan 2005-06-09 10:53:22 EDT
Please post a log with scsi.logging_level=0x0000003d that shows the I/O errors
happening. 

> Vendor: JetStor   Model: Volume Set # 01   Rev: R001

Maybe see if they have any firmware updates for this?
Comment 7 Tom Coughlan 2005-09-19 14:36:31 EDT
This BZ has been in NEEDINFO for three months. We will assume the problem was
not reproduceable or has been fixed by a firmware update, or by a later RHEL 4
update. If this problem still exists, please reopen and provide more info.  

Note You need to log in before you can comment on or make changes to this bug.