From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.12) Gecko/20050922 Fedora/1.0.7-1.1.fc3 Firefox/1.0.7 Description of problem: mptscsih driver resets the SCSI bus followed by ABORTS during startup Here is a portion of dmesg: mptbase: 2 MPT adapters found, 2 installed. Fusion MPT SCSI Host driver 2.05.16.02 scsi0 : ioc0: LSI53C1030, FwRev=01032700h, Ports=1, MaxQ=255, IRQ=24 scsi1 : ioc1: LSI53C1030, FwRev=01032700h, Ports=1, MaxQ=255, IRQ=25 blk: queue c8158e18, I/O limit 4294967295Mb (mask 0xffffffffffffffff) scsi0 channel 0 : resetting for second half of retries. SCSI bus is being reset for host 0 channel 0. mptscsih: OldReset scheduling BUS_RESET (sc=c8158000) scsi : aborting command due to timeout : pid 4, scsi0, channel 0, id 3, lun 0 Inquiry 00 00 00 ff 00 mptscsih: OldAbort scheduling ABORT SCSI IO (sc=c8158000) scsi : aborting command due to timeout : pid 4, scsi0, channel 0, id 3, lun 0 Inquiry 00 00 00 ff 00 mptscsih: OldAbort scheduling ABORT SCSI IO (sc=c8158000) scsi : aborting command due to timeout : pid 4, scsi0, channel 0, id 3, lun 0 Inquiry 00 00 00 ff 00 mptscsih: OldAbort scheduling ABORT SCSI IO (sc=c8158000) Version-Release number of selected component (if applicable): kernel-2.4.21-32.0.1.ELsmp How reproducible: Always Steps to Reproduce: 1. Connect HP Storage Works DAT 72 Tape Drive to LSI Logic 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI controller. 2. Boot system. Actual Results: Tape drive is not accessible Expected Results: Tape drive should be accessible. Additional info: This breaks in 2.4.21-20.0.1.EL 2.4.21-4.ELsmp - works 2.4.21-9.0.3.ELsmp -works 2.4.21-15.0.4.ELsmp -works 2.4.21-20.0.1.ELsmp - no longer works These kernels were tested on the same system.
Here is a portion of the dmesg output for the 2.4.21-20.0.1.ELsmp with debugging turned on in the mptscsih driver, tape device (scsi 0 id 3 lun 0) reports that it is busy leading to the bus resets and aborts. mptscsih: ioc0: ScsiDone (mf=c81806c0,mr=c818c480,sc=c8570000,idx=18) Uh-Oh! (0:3:0) mf=c81806c0, mr=c818c480, sc=c8570000 IOCStatus=0000h, SCSIState=01h, SCSIStatus=02h, IOCLogInfo=00000000h sc->result set to 00000002h scsi0 channel 0 : resetting for second half of retries. SCSI bus is being reset for host 0 channel 0. mptscsih: OldReset scheduling BUS_RESET (sc=c8570000) scsi : aborting command due to timeout : pid 4, scsi0, channel 0, id 3, lun 0 Inquiry 00 00 00 ff 00 mptscsih: OldAbort scheduling ABORT SCSI IO (sc=c8570000) scsi : aborting command due to timeout : pid 4, scsi0, channel 0, id 3, lun 0 Inquiry 00 00 00 ff 00 mptscsih: OldAbort scheduling ABORT SCSI IO (sc=c8570000) scsi : aborting command due to timeout : pid 4, scsi0, channel 0, id 3, lun 0 Inquiry 00 00 00 ff 00 mptscsih: OldAbort scheduling ABORT SCSI IO (sc=c8570000)
Here is a bit of status so far, but no conclusion yet. I ran 2.4.21-20.ELsmp and had no problem at all: kernel: Fusion MPT base driver 2.05.16 kernel: Copyright (c) 1999-2004 LSI Logic Corporation kernel: mptbase: Initiating ioc0 bringup kernel: ioc0: 53C1030: Capabilities={Initiator,Target} kernel: mptbase: Initiating ioc1 bringup kernel: ioc1: 53C1030: Capabilities={Initiator,Target} kernel: mptbase: 2 MPT adapters found, 2 installed. kernel: Fusion MPT SCSI Host driver 2.05.16 kernel: scsi2 : ioc0: LSI53C1030, FwRev=01032740h, Ports=1, MaxQ=255, IRQ=19 kernel: scsi3 : ioc1: LSI53C1030, FwRev=01032740h, Ports=1, MaxQ=255, IRQ=19 kernel: blk: queue f7600e18, I/O limit 4294967295Mb (mask 0xffffffffffffffff) kernel: Vendor: SEAGATE Model: DAT DAT72-000 Rev: A060 kernel: Type: Sequential-Access ANSI SCSI revision: 03 kernel: blk: queue f6ec5618, I/O limit 4294967295Mb (mask 0xffffffffffffffff) kernel: Attached scsi tape st0 at scsi2, channel 0, id 6, lun 0 kernel: resize_dma_pool: unknown device type 12 kernel: SCSI device sdd: 0 512-byte hdwr sectors (0 MB) kernel: blk: queue f6ec5218, I/O limit 4294967295Mb (mask 0xffffffffffffffff) kernel: resize_dma_pool: unknown device type 12 kernel: SCSI device sdd: 0 512-byte hdwr sectors (0 MB) This driver and kernel version are sligthly different from the customer's. My next step is to re-test with the driver the customer has. It might be helpful to get the tape model and revision info. from the customer's system, so we can see if it matches what I have. It will be in dmsg on one of the earlier kernels that works. The driver was updated in U6, so if the customer could try that (or U7 beta) it may also provide some useful info.
Here is the tape model and revision from the 2.4.21-4.ELsmp dmesg: Fusion MPT base driver 2.05.05+ Copyright (c) 1999-2002 LSI Logic Corporation mptbase: Initiating ioc0 bringup ioc0: 53C1030: Capabilities={Initiator,Target} mptbase: Initiating ioc1 bringup ioc1: 53C1030: Capabilities={Initiator,Target} mptbase: 2 MPT adapters found, 2 installed. Fusion MPT SCSI Host driver 2.05.05+ scsi0 : ioc0: LSI53C1030, FwRev=01032700h, Ports=1, MaxQ=255, IRQ=24 scsi1 : ioc1: LSI53C1030, FwRev=01032700h, Ports=1, MaxQ=255, IRQ=25 Starting timer : 0 0 blk: queue c6d70e18, I/O limit 4294967295Mb (mask 0xffffffffffffffff) Vendor: HP Model: C7438A Rev: V312 Type: Sequential-Access ANSI SCSI revision: 03 Starting timer : 0 0 blk: queue c6d70c18, I/O limit 4294967295Mb (mask 0xffffffffffffffff) mptscsih: ioc0: scsi0: Id=3 Lun=0: Queue depth=1
Okay, I tested the same kernel and driver as the customer: 2.4.21-32.0.1.ELsmp 2.05.16.02 on the DAT 72 tape model that I have. No problem. I also tested 2.4.21-20.ELsmp (mentioned above), and 2.4.21-38.ELsmp (U7 beta, with driver version 2.06.16.01). No problem. So, the issue may be specific to the particular HBA model, HBA FW version (001032700h vs my 01032740h), the differenece between the HP and Seagate version of this drive, or something else in the kernel. Sometimes when all the commands issued to the drive timeout, it is due to an interrupt routing problem. Can the customer try the U7 beta, just so they are running the latest? Next, ask then to post the full dmesg after booting a kernel that works and one that does not work, so we can look at interrupt issues, etc. Are they willing to run some test drivers for us? Also ask them to check with HP and see if they have the latest drive firmware. I guess I'll need to ask HP to summarize the differences between the drive models.
Created attachment 123306 [details] dmesg from 2.4.21-4
Created attachment 123307 [details] dmesg from 2.4.21-38