Bug 188487 - Problem with MPT Fusion and external RAID.
Problem with MPT Fusion and external RAID.
Status: CLOSED INSUFFICIENT_DATA
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
5
i386 Linux
medium Severity medium
: ---
: ---
Assigned To: Kernel Maintainer List
Brian Brock
MassClosed
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2006-04-10 11:51 EDT by William T. Musil
Modified: 2008-01-19 23:37 EST (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-01-19 23:37:24 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description William T. Musil 2006-04-10 11:51:47 EDT
Description of problem:
I have an nStor Wahoo SATA Array (nStor-Xyratex 4700S, also Gateway 840 OEM) 
This device has U320 host connectivity and presets logical volumes and the SAF-
TE controller as multiple luns on the same id. Works on Redhat 9 (2.4.20) with 
the included mpt fusion 2.03.00 driver. Does not work on FC5 (2.6.16) with the 
included mpt fusion 3.03.07 driver. Acts as if the disk is bad, with superblock 
errors, and buffer i/o errors on device. Gateway has tested the same device on 
the same computer type with the same problem. They went one further and  
installed and adaptec 29160 to the system, using the drivers that came with 
FC5, attached the array to the 29160, and it worked.


Version-Release number of selected component (if applicable):
Failing - (kernel=2.6.16-1.2080_FC5smp - using LSI Fusion MPT driver 3.03.07 
included in fedora5 distribution)
Working - kernel=2.4.20-8smp - using LSI Fusion MPT driver 2.03.00 included in 
redhat9 distribution
Motherboard = Intel SE7320VP2
BIOS = AMIBIOS SE7520JR23.15A.P.08.10.0081
SCSI Chipset = LSI53C1030
Firmware = LSI MPTBIOS-IME-5.10.04
Scan Luns >0 = Yes (default setting)


How reproducible:Always


Steps to Reproduce:
1.Attach a configured nStor Wahoo SATA 4700S or a Gateway 840 array to the 
external vhd scsi port on an LSI 53C1030 equipped system running FC5.
2.
3.
Actual results:
During startup, as soon as the device is initialized, a buffer i/o error will 
be written to /var/log/messages. Any action of fdisk, lvm pvcreate, vgcreate, 
lvcreate appears to work. mkfs against simple or lvm filesystem will fail with 
short-read errors, fsck will fail with bad magic number in superblock, none of 
the superblocks listed during mkfs can be used to fsck the filesystem.

Expected results:
As with redhat9, on boot it is seen. fdisk or lvm actions work. mkfs works, 
fsck works and filesystem mounts.

Additional info: LSI reports that they do not provide drivers for Fedora. and 
that the driver kit MPT Fusion 3.03.07 is not their doing.
Comment 1 William T. Musil 2006-04-19 13:41:29 EDT
Further testing has been performed.

Installed an adaptec 29320APL-R. Using the driver included with FC5, 
Adaptec AIC79XX PCI-X SCSI HBA DRIVER, Rev 3.0

With the array attached to the 29320, sometimes causes panic in boot during udev

BUG: spin lock recursion on CPU#0, scsi_eh 2/895 (Not tainted)
lock: f6d13ac0, .magic: dead4ead, .owner: scsi_eh_2/895, .owner_cpu: 0
Kernel panic - not syncing: bad locking


Sometimes, system boots, but attempt at filesystem access results in another 
panic with the following in /var/log/messages.

Apr 14 16:43:31 nas1 kernel: sd 2:0:4:0: Attempting to queue an ABORT 
message:CDB: 0x28 0x0 0x0 0x0 0xec 0x6f 0x0 0x0 0x70 0x0
Apr 14 16:43:31 nas1 kernel: scsi2: At time of recovery, card was not paused
Apr 14 16:43:31 nas1 kernel: >>>>>>>>>>>>>>>>>> Dump Card State Begins 
<<<<<<<<<<<<<<<<<
Apr 14 16:43:31 nas1 kernel: scsi2: Dumping Card State at program address 0x37 
Mode 0x22
Apr 14 16:43:31 nas1 kernel: Card was paused
Apr 14 16:43:31 nas1 kernel: INTSTAT[0x0] SELOID[0x4] SELID[0x40] HS_MAILBOX
[0x0] 
Apr 14 16:43:31 nas1 kernel: INTCTL[0x80] SEQINTSTAT[0x0] SAVED_MODE[0x11] 
DFFSTAT[0x33] 
Apr 14 16:43:31 nas1 kernel: SCSISIGI[0x25] SCSIPHASE[0x0] SCSIBUS[0x0] 
LASTPHASE[0x1] 
Apr 14 16:43:31 nas1 kernel: SCSISEQ0[0x40] SCSISEQ1[0x12] SEQCTL0[0x0] 
SEQINTCTL[0x0] 
Apr 14 16:43:31 nas1 kernel: SEQ_FLAGS[0x0] SEQ_FLAGS2[0x4] QFREEZE_COUNT[0x1] 
Apr 14 16:43:31 nas1 kernel: KERNEL_QFREEZE_COUNT[0x1] MK_MESSAGE_SCB[0xff00] 
Apr 14 16:43:31 nas1 kernel: MK_MESSAGE_SCSIID[0xff] SSTAT0[0x10] SSTAT1[0x0] 
Apr 14 16:43:31 nas1 kernel: SSTAT2[0x0] SSTAT3[0x0] PERRDIAG[0xc0] SIMODE1
[0xac] 
Apr 14 16:43:31 nas1 kernel: LQISTAT0[0x0] LQISTAT1[0x0] LQISTAT2[0x80] LQOSTAT0
[0x0] 
Apr 14 16:43:31 nas1 kernel: LQOSTAT1[0x0] LQOSTAT2[0x40] 
Apr 14 16:43:31 nas1 kernel: 
Apr 14 16:43:31 nas1 kernel: SCB Count = 4 CMDS_PENDING = 4 LASTSCB 0x3 CURRSCB 
0x2 NEXTSCB 0x3
Apr 14 16:43:31 nas1 kernel: qinstart = 185 qinfifonext = 185
Apr 14 16:43:31 nas1 kernel: QINFIFO:
Apr 14 16:43:31 nas1 kernel: WAITING_TID_QUEUES:
Apr 14 16:43:31 nas1 kernel:        4 ( 0x2 0x3 0x1 0x0 )
Apr 14 16:43:31 nas1 kernel: Pending list:
Apr 14 16:43:31 nas1 kernel:   0 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID
[0x47] 
Apr 14 16:43:31 nas1 kernel:   1 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID
[0x47] 
Apr 14 16:43:31 nas1 kernel:   3 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID
[0x47] 
Apr 14 16:43:31 nas1 kernel:   2 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID
[0x47] 
Apr 14 16:43:31 nas1 kernel: Total 4
Apr 14 16:43:31 nas1 kernel: Kernel Free SCB list: 
Apr 14 16:43:32 nas1 kernel: Sequencer Complete DMA-inprog list: 
Apr 14 16:43:32 nas1 kernel: Sequencer Complete list: 
Apr 14 16:43:32 nas1 kernel: Sequencer DMA-Up and Complete list: 
Apr 14 16:43:32 nas1 kernel: Sequencer On QFreeze and Complete list: 
Apr 14 16:43:32 nas1 kernel: 
Apr 14 16:43:32 nas1 kernel: 
Apr 14 16:43:32 nas1 kernel: scsi2: FIFO0 Free, LONGJMP == 0x826b, SCB 0x2
Apr 14 16:43:32 nas1 kernel: SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x0] DFSTATUS
[0x89] 
Apr 14 16:43:32 nas1 kernel: SG_CACHE_SHADOW[0x2] SG_STATE[0x0] DFFSXFRCTL[0x0] 
Apr 14 16:43:32 nas1 kernel: SOFFCNT[0x0] MDFFSTAT[0x5] SHADDR = 0x00, SHCNT = 
0x0 
Apr 14 16:43:32 nas1 kernel: HADDR = 0x00, HCNT = 0x0 CCSGCTL[0x10] 
Apr 14 16:43:32 nas1 kernel: 
Apr 14 16:43:32 nas1 kernel: scsi2: FIFO1 Free, LONGJMP == 0x8063, SCB 0x3
Apr 14 16:43:32 nas1 kernel: SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x0] DFSTATUS
[0x89] 
Apr 14 16:43:32 nas1 kernel: SG_CACHE_SHADOW[0x2] SG_STATE[0x0] DFFSXFRCTL[0x0] 
Apr 14 16:43:32 nas1 kernel: SOFFCNT[0x0] MDFFSTAT[0x5] SHADDR = 0x00, SHCNT = 
0x0 
Apr 14 16:43:32 nas1 kernel: HADDR = 0x00, HCNT = 0x0 CCSGCTL[0x10] 
Apr 14 16:43:32 nas1 kernel: LQIN: 0x8 0x0 0x0 0x2 0x0 0x0 0x0 0x0 0x0 0x0 0x0 
0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 
Apr 14 16:43:32 nas1 kernel: scsi2: LQISTATE = 0x1, LQOSTATE = 0x5, OPTIONMODE 
= 0x52
Apr 14 16:43:32 nas1 kernel: scsi2: OS_SPACE_CNT = 0x20 MAXCMDCNT = 0x0
Apr 14 16:43:32 nas1 kernel: scsi2: SAVED_SCSIID = 0x0 SAVED_LUN = 0x0
Apr 14 16:43:32 nas1 kernel: SIMODE0[0xc] 
Apr 14 16:43:32 nas1 kernel: CCSCBCTL[0x4] 
Apr 14 16:43:32 nas1 kernel: scsi2: REG0 == 0x0, SINDEX = 0x10a, DINDEX = 0x10a
Apr 14 16:43:32 nas1 kernel: scsi2: SCBPTR == 0x1, SCB_NEXT == 0x0, SCB_NEXT2 
== 0xffd1
Apr 14 16:43:32 nas1 kernel: CDB 2a 0 0 8 0 4f
Apr 14 16:43:32 nas1 kernel: STACK: 0x20 0x0 0x0 0x0 0x0 0x0 0x0 0x0
Apr 14 16:43:32 nas1 kernel: <<<<<<<<<<<<<<<<< Dump Card State Ends 
>>>>>>>>>>>>>>>>>>
Apr 14 16:43:32 nas1 kernel: scsi2:0:4:0: Cmd aborted from QINFIFO
Apr 14 16:43:32 nas1 kernel: aic79xx_abort returns 0x2002

System boots and runs fine if array is not attached to the system. 

Swapped out Adaptec 29320 for Adaptec 29160. Using the driver included with FC5.
Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 7.0

As discussed from Gateway's diagnosis. Everything works fine. No problems 
accessing the filesystem on the device or performing 200Gb+ I/O operations 
to/from an LTO3 tape attached to the integrated LSI using MPT Fusion 3.03.07.

I am up and running with the nStor 4700S attached to an Adaptec 29160 under FC5.

This does not appear to be exclusive to LSI.
I am pretty sure that the integrated LSI is PCI-E or PCI-X.
Could this be a multi-lun problem in the PCI-X drivers? 
Could this be a kernel timing issue in the PCI-X drivers?
Comment 2 Dan Carpenter 2006-04-20 22:50:28 EDT
LSI makes the controllers on the those nStors.  They are bug compatible with the
nStor SCSI controllers in your Gateway box.  They won't work with Adaptec SCSI
cards.

There was a huge rewrite of the mpt fusion drivers between FC4 and FC5.

I'm seeing the same issues that you are seeing.

Comment 3 William T. Musil 2006-04-21 09:36:43 EDT
Actually, 
It seems that it only works attached to an add-in adaptec 29160. 
I have been load and backup testing filesystems on it for two days.

You might want to try an adaptec 29160 card.
Comment 4 Dave Jones 2006-10-16 13:33:11 EDT
A new kernel update has been released (Version: 2.6.18-1.2200.fc5)
based upon a new upstream kernel release.

Please retest against this new kernel, as a large number of patches
go into each upstream release, possibly including changes that
may address this problem.

This bug has been placed in NEEDINFO state.
Due to the large volume of inactive bugs in bugzilla, if this bug is
still in this state in two weeks time, it will be closed.

Should this bug still be relevant after this period, the reporter
can reopen the bug at any time. Any other users on the Cc: list
of this bug can request that the bug be reopened by adding a
comment to the bug.

In the last few updates, some users upgrading from FC4->FC5
have reported that installing a kernel update has left their
systems unbootable. If you have been affected by this problem
please check you only have one version of device-mapper & lvm2
installed.  See bug 207474 for further details.

If this bug is a problem preventing you from installing the
release this version is filed against, please see bug 169613.

If this bug has been fixed, but you are now experiencing a different
problem, please file a separate bug for the new problem.

Thank you.
Comment 5 Jon Stanley 2008-01-19 23:37:24 EST
(this is a mass-close to kernel bugs in NEEDINFO state)

As indicated previously there has been no update on the progress of this bug
therefore I am closing it as INSUFFICIENT_DATA. Please re-open if the issue
still occurs for you and I will try to assist in its resolution. Thank you for
taking the time to report the initial bug.

If you believe that this bug was closed in error, please feel free to reopen
this bug.

Note You need to log in before you can comment on or make changes to this bug.