Bug 175385

Summary: HP StorageWorks DAT 72 doesn't work with mptscsih driver
Product: Red Hat Enterprise Linux 3 Reporter: David Milburn <dmilburn>
Component: kernelAssignee: Tom Coughlan <coughlan>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.0CC: coldwell, petrides, tao
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-12-21 15:55:51 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 170417    
Attachments:
Description Flags
dmesg from 2.4.21-4
none
dmesg from 2.4.21-38 none

Description David Milburn 2005-12-09 18:21:12 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.12) Gecko/20050922 Fedora/1.0.7-1.1.fc3 Firefox/1.0.7

Description of problem:
mptscsih driver resets the SCSI bus followed by ABORTS during startup

Here is a portion of dmesg:

mptbase: 2 MPT adapters found, 2 installed.
Fusion MPT SCSI Host driver 2.05.16.02
scsi0 : ioc0: LSI53C1030, FwRev=01032700h, Ports=1, MaxQ=255, IRQ=24
scsi1 : ioc1: LSI53C1030, FwRev=01032700h, Ports=1, MaxQ=255, IRQ=25
blk: queue c8158e18, I/O limit 4294967295Mb (mask 0xffffffffffffffff)
scsi0 channel 0 : resetting for second half of retries.
SCSI bus is being reset for host 0 channel 0.
mptscsih: OldReset scheduling BUS_RESET (sc=c8158000)
scsi : aborting command due to timeout : pid 4, scsi0, channel 0, id 3, lun 0 Inquiry 00 00 00 ff 00
mptscsih: OldAbort scheduling ABORT SCSI IO (sc=c8158000)
scsi : aborting command due to timeout : pid 4, scsi0, channel 0, id 3, lun 0 Inquiry 00 00 00 ff 00
mptscsih: OldAbort scheduling ABORT SCSI IO (sc=c8158000)
scsi : aborting command due to timeout : pid 4, scsi0, channel 0, id 3, lun 0 Inquiry 00 00 00 ff 00
mptscsih: OldAbort scheduling ABORT SCSI IO (sc=c8158000)




Version-Release number of selected component (if applicable):
kernel-2.4.21-32.0.1.ELsmp

How reproducible:
Always

Steps to Reproduce:
1. Connect HP Storage Works DAT 72 Tape Drive to LSI Logic 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI controller.
2. Boot system.
  

Actual Results:  Tape drive is not accessible

Expected Results:  Tape drive should be accessible.

Additional info:

This breaks in 2.4.21-20.0.1.EL

2.4.21-4.ELsmp - works
2.4.21-9.0.3.ELsmp -works
2.4.21-15.0.4.ELsmp -works
2.4.21-20.0.1.ELsmp - no longer works

These kernels were tested on the same system.

Comment 1 David Milburn 2005-12-09 18:27:17 UTC
Here is a portion of the dmesg output for the 2.4.21-20.0.1.ELsmp with debugging
turned on in the mptscsih driver, tape device (scsi 0 id 3 lun 0) reports that
it is busy leading to the bus resets and aborts.

mptscsih: ioc0: ScsiDone (mf=c81806c0,mr=c818c480,sc=c8570000,idx=18)
  Uh-Oh! (0:3:0) mf=c81806c0, mr=c818c480, sc=c8570000
  IOCStatus=0000h, SCSIState=01h, SCSIStatus=02h, IOCLogInfo=00000000h
  sc->result set to 00000002h
scsi0 channel 0 : resetting for second half of retries.
SCSI bus is being reset for host 0 channel 0.
mptscsih: OldReset scheduling BUS_RESET (sc=c8570000)
scsi : aborting command due to timeout : pid 4, scsi0, channel 0, id 3, lun 0
Inquiry 00 00 00 ff 00
mptscsih: OldAbort scheduling ABORT SCSI IO (sc=c8570000)
scsi : aborting command due to timeout : pid 4, scsi0, channel 0, id 3, lun 0
Inquiry 00 00 00 ff 00
mptscsih: OldAbort scheduling ABORT SCSI IO (sc=c8570000)
scsi : aborting command due to timeout : pid 4, scsi0, channel 0, id 3, lun 0
Inquiry 00 00 00 ff 00
mptscsih: OldAbort scheduling ABORT SCSI IO (sc=c8570000)


Comment 3 Tom Coughlan 2006-01-07 19:35:11 UTC
Here is a bit of status so far, but no conclusion yet.

I ran 2.4.21-20.ELsmp

and had no problem at all:

kernel: Fusion MPT base driver 2.05.16
kernel: Copyright (c) 1999-2004 LSI Logic Corporation
kernel: mptbase: Initiating ioc0 bringup
kernel: ioc0: 53C1030: Capabilities={Initiator,Target}
kernel: mptbase: Initiating ioc1 bringup
kernel: ioc1: 53C1030: Capabilities={Initiator,Target}
kernel: mptbase: 2 MPT adapters found, 2 installed.
kernel: Fusion MPT SCSI Host driver 2.05.16
kernel: scsi2 : ioc0: LSI53C1030, FwRev=01032740h, Ports=1, MaxQ=255, IRQ=19
kernel: scsi3 : ioc1: LSI53C1030, FwRev=01032740h, Ports=1, MaxQ=255, IRQ=19
kernel: blk: queue f7600e18, I/O limit 4294967295Mb (mask 0xffffffffffffffff)
kernel:   Vendor: SEAGATE   Model: DAT    DAT72-000  Rev: A060
kernel:   Type:   Sequential-Access                  ANSI SCSI revision: 03
kernel: blk: queue f6ec5618, I/O limit 4294967295Mb (mask 0xffffffffffffffff)
kernel: Attached scsi tape st0 at scsi2, channel 0, id 6, lun 0
kernel: resize_dma_pool: unknown device type 12
kernel: SCSI device sdd: 0 512-byte hdwr sectors (0 MB)
kernel: blk: queue f6ec5218, I/O limit 4294967295Mb (mask 0xffffffffffffffff)
kernel: resize_dma_pool: unknown device type 12
kernel: SCSI device sdd: 0 512-byte hdwr sectors (0 MB)


This driver and kernel version are sligthly different from the customer's. My
next step is to re-test with the driver the customer has. 

It might be helpful to get the tape model and revision info. from the customer's
system, so we can see if it matches what I have. It will be in dmsg on one of
the earlier kernels that works.

The driver was updated in U6, so if the customer could try that (or U7 beta) it
may also provide some useful info. 

Comment 5 David Milburn 2006-01-09 17:07:14 UTC
Here is the tape model and revision from the 2.4.21-4.ELsmp dmesg:

Fusion MPT base driver 2.05.05+
Copyright (c) 1999-2002 LSI Logic Corporation
mptbase: Initiating ioc0 bringup
ioc0: 53C1030: Capabilities={Initiator,Target}
mptbase: Initiating ioc1 bringup
ioc1: 53C1030: Capabilities={Initiator,Target}
mptbase: 2 MPT adapters found, 2 installed.
Fusion MPT SCSI Host driver 2.05.05+
scsi0 : ioc0: LSI53C1030, FwRev=01032700h, Ports=1, MaxQ=255, IRQ=24
scsi1 : ioc1: LSI53C1030, FwRev=01032700h, Ports=1, MaxQ=255, IRQ=25
Starting timer : 0 0
blk: queue c6d70e18, I/O limit 4294967295Mb (mask 0xffffffffffffffff)
  Vendor: HP        Model: C7438A            Rev: V312
  Type:   Sequential-Access                  ANSI SCSI revision: 03
Starting timer : 0 0
blk: queue c6d70c18, I/O limit 4294967295Mb (mask 0xffffffffffffffff)
mptscsih: ioc0: scsi0: Id=3 Lun=0: Queue depth=1


Comment 6 Tom Coughlan 2006-01-16 01:00:02 UTC
Okay, I tested the same kernel and driver as the customer:

2.4.21-32.0.1.ELsmp 
2.05.16.02

on the DAT 72 tape model that I have. No problem. I also tested 2.4.21-20.ELsmp
(mentioned above), and 2.4.21-38.ELsmp (U7 beta, with driver version
2.06.16.01). No problem. 

So, the issue may be specific to the particular HBA model, HBA FW version
(001032700h vs my 01032740h), the differenece between the HP and Seagate version
of this drive, or something else in the kernel. Sometimes when all the commands
issued to the drive timeout, it is due to an interrupt routing problem. 

Can the customer try the U7 beta, just so they are running the latest? Next, 
ask then to post the full dmesg after booting a kernel that works and one that
does not work, so we can look at interrupt issues, etc. Are they willing to run
some test drivers for us? Also ask them to check with HP and see if they have
the latest drive firmware.

I guess I'll need to ask HP to summarize the differences between the drive models. 

Comment 7 Brad Hinson 2006-01-17 16:32:04 UTC
Created attachment 123306 [details]
dmesg from 2.4.21-4

Comment 8 Brad Hinson 2006-01-17 16:33:20 UTC
Created attachment 123307 [details]
dmesg from 2.4.21-38