Bug 230703 - Fusion MPT SPI Host driver not detecting any disks
Fusion MPT SPI Host driver not detecting any disks
Status: CLOSED NEXTRELEASE
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
rawhide
i386 Linux
medium Severity medium
: ---
: ---
Assigned To: Kernel Maintainer List
Brian Brock
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2007-03-02 07:15 EST by Denis Leroy
Modified: 2009-03-22 02:11 EDT (History)
11 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-11-30 10:00:40 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Kernel dmesg with MPT logs (24.88 KB, application/octet-stream)
2007-03-02 07:15 EST, Denis Leroy
no flags Details

  None (edit)
Description Denis Leroy 2007-03-02 07:15:05 EST
- Description of problem:

Since Fedora FC-7/rawhide moved past 2.6.19, the LSI Logic Fusion MPT SPI Host
driver has stopped working completely: the symptoms are that it simply fails to
detect any attached disks. Nothing suspicious MPT-related shows up in the dmesg
(see attached dmesg, there are ata error messages but i's unclear if this has
anything to do with this): it detects the scsi controller, but no disks.


- Version-Release number of selected component (if applicable):

The problem is known to occur with kernel 2.6.20-1.2953.fc7 (MPT driver 3.04.04).

OTOH, the driver works fine with latest FC-6 (2.6.19-1.2911.fc6, MPT driver
3.04.02) and RHEL5-RC2 (2.6.18-8.el5, MPT driver 3.04.02).

I was also unable to reproduce the problem with the same fc7 kernel, but on a
SunFire X4200 (mptsas.ko, x86_64), so it seems to be affecting mptspi.ko
specifically (or is i386 only).


- How reproducible:

Always.


- Steps to Reproduce:
 
The best way to reproduce is using VMWare Workstation, which uses MPT SPI for
its SCSI interface (1000:0030 LSI Logic / Symbios Logic 53c1030 PCI-X Fusion-MPT
Dual Ultra320 SCSI). It will not detect any disks, making it impossible to boot
any VM on a SCSI disk. The workaround is to configure the VM disk as an IDE
disk. Once FC7 booted on the IDE disk, any extra SCSI discs attached are not
detected..
Comment 1 Denis Leroy 2007-03-02 07:15:05 EST
Created attachment 149105 [details]
Kernel dmesg with MPT logs
Comment 2 Thomas M Steenholdt 2007-03-05 05:26:29 EST
same issue here - VMware workstation 5.5.3 based lsilogic. Let me know if I can
provide any useful info.
Comment 3 Eric Paris 2007-03-13 14:31:43 EDT
So I hacked up my 2.6.20 initrd to pause for 5 seconds after loading mptspi and
to the cat /proc/interrupts every second for 5 more seconds.  Interrupt 16 had
fired 14 times during the 5 second pause but over the next 5 seconds it did not
fire at all any more.

**** 2.6.20 ****

Loading mptbase.ko module
Fusion MPT base driver 3.04.04
Copyright (c) 1999-2007 LSI Logic Corporation
Loading mptscsih.ko module
Loading mptspi.ko module
Fusion MPT SPI Host driver 3.04.04
ACPI: PCI Interrupt 0000:00:10.0[A] -> GSI 17 (level, low) -> IRQ 16
mptbase: Initiating ioc0 bringup
ioc0: 53C1030: Capabilities={Initiator}
scsi0 : ioc0: LSI53C1030, FwRev=00000000h, Ports=1, MaxQ=128, IRQ=16
Sleeping 5 seconds
           CPU0       CPU1
  0:        717          0   IO-APIC-edge      timer
  1:          7          2   IO-APIC-edge      i8042
  4:         24         57   IO-APIC-edge      serial
  8:          0          0   IO-APIC-edge      rtc
  9:          0          0   IO-APIC-fasteoi   acpi
 12:        106          0   IO-APIC-edge      i8042
 16:          0         14   IO-APIC-fasteoi   ioc0
NMI:          0          0
LOC:       3061       3511
ERR:          0
MIS:          0

**** 2.6.19 *****

Fusion MPT base driver 3.04.03
Copyright (c) 1999-2007 LSI Logic Corporation
input: ImPS/2 Generic Wheel Mouse as /class/input/input1
Fusion MPT SPI Host driver 3.04.03
ACPI: PCI Interrupt 0000:00:10.0[A] -> GSI 17 (level, low) -> IRQ 16
mptbase: Initiating ioc0 bringup
ioc0: 53C1030: Capabilities={Initiator}
scsi0 : ioc0: LSI53C1030, FwRev=00000000h, Ports=1, MaxQ=128, IRQ=16
scsi 0:0:0:0: Direct-Access     VMware,  VMware Virtual S 1.0  PQ: 0 ANSI: 2
 target0:0:0: Beginning Domain Validation
 target0:0:0: Domain Validation skipping write tests
 target0:0:0: Ending Domain Validation
 target0:0:0: FAST-40 WIDE SCSI 80.0 MB/s ST (25 ns, offset 127)
device-mapper: ioctl: 4.11.0-ioctl (2006-10-12) initialised: dm-devel@redhat.com
SCSI device sda: 33554432 512-byte hdwr sectors (17180 MB)
sda: Write Protect is off
sda: Mode Sense: 5d 00 00 00
sda: cache data unavailable
sda: assuming drive cache: write through
SCSI device sda: 33554432 512-byte hdwr sectors (17180 MB)
sda: Write Protect is off
sda: Mode Sense: 5d 00 00 00
sda: cache data unavailable
sda: assuming drive cache: write through
 sda: sda1 sda2
sd 0:0:0:0: Attached scsi disk sda
Comment 4 Eric Moore 2007-03-15 13:45:14 EDT
I believe this issue only seen in a VMWare environment, is that correct? The 
log back at comment #1 indicates no devices; so there's nothing else to 
decipher from the log.

Just some info on the versions:  The 3.04.04 driver was added after 2.6.20 
kernel released.  Meaning you will find it in 2.6.21-rc3.  I didn't check 
whether it was ported back to the stable version branch, such as 3.6.20.3. Is 
that what your on?  The 3.04.03 driver is what you will find in 2.6.20 stock 
kernel.

In reviewing the changes in that patch, there is nothing that strikes me that 
would cause this problem in mptspi.c.   Mostly I was reorganizing the code, 
there was some debug printks added, and I added support for greater than 255 
lun and targets.   Its possible that the "greater than 255" target support is 
the problem.   If you look in mptspi_probe, I changed the max_id from 
MPT_MAX_SCSI_DEVICES to ioc->devices_per_bus.  This value was calculated from 
the port->PortSCSIID.   This is obtained from some configuration pages called 
port facts, returned from firmwawre.  Perhaps VMWare, which is emulating the 
firmware, is not setting this.  Can you insert a printk in mptspi_probe, to 
see what the max_id is set to?





Comment 5 Eric Paris 2007-03-15 14:34:39 EDT
I'll get that information on Monday when I get back to my office.  Thanks.
Comment 6 Denis Leroy 2007-03-16 06:03:37 EDT
Eric,

I confirm that your intuition is true. max_id is indeed zero. If I change it to
MPT_MAX_SCSI_DEVICES, devices are probed correctly. Below are the Port Facts
returned by the VMWare firmware. PortSCSIID is equal to 7, but MaxDevices is 0,
and that's what ioc->devices_per_bus is computed from (is this computation
correct with respect to the semantic of the PortFacts values ?).

MsgLength:         10
Function:           5
PortNumber:         0
MsgFlags:           0
MsgContext:         0
IOCStatus:          0
IOCLogInfo:         0
PortType:           1
MaxDevices:         0
PortSCSIID:         7
ProtocolFlags:      8
MaxPostedCmdBuffer: 0
MaxPersistentIDs:   0
MaxLanBuckets:      0
MaxInitiators:      0
Comment 7 Tom Coughlan 2007-03-16 16:02:43 EDT
So this looks like an interaction between vmware and the latest mpt fusion driver. 
Adding Ed. Please refer this to the appropriate person at vmware. 
Comment 8 Ed Goggin 2007-03-16 16:35:19 EDT
Tom, thanks for ping on this problem.

Eric, which Red Hat and Novell release streams did this change go into?
Comment 9 Eric Moore 2007-03-16 17:07:44 EDT
This is not going to impact any current rhel/sles release streams, except for 
RHEL5.1  The 3.04.04 driver is only in the 2.6.21-rc bits   I need to post a 
patch to lsml@ with the fix, hoping to handle this over the weekend.  There is 
bugzilla 225177, which is FATE for RHEL5.1 , that driver will have this 
problem.  I will need to append this patch to the bugzilla.

On other note, can vmware fix the emulated 1030 so it fills the port facts 
properly.  According to comment #6, the MaxDevices is set zero, when it should 
be 16. 
Comment 10 Ed Goggin 2007-03-16 17:24:21 EDT
This was caught and fixed (to set Maxdevices to 16) internally while testing 
with FC-7 sometime earlier this month.
Comment 11 Thomas M Steenholdt 2007-03-22 17:28:53 EDT
Ed,

(perhaps slightly off-topic here, but not enough to keep me from posting :o))

Since it would appear that only VMs under VMware are bitten by this bug, can you
provide any information on whether this will be addressed (fixed) in some way,
shape or form? And perhaps a timeframe for such a fix?

Thanks :)
Comment 12 Denis Leroy 2007-03-23 04:19:21 EDT
Eric, what would the LSI driver-side patch look like ? Test for MaxDevices==0
and set it to something else ? I'd like to try and push this patch on the Fedora
kernel side, hence my question. Thanks :-)
Comment 13 Tom Coughlan 2007-11-21 15:36:50 EST
(In reply to comment #9)
> I need to post a 
> patch to lsml@ with the fix, hoping to handle this over the weekend.  

Denis, is this all fixed now?  Please close if so. Thanks. 

Comment 14 Denis Leroy 2007-11-30 10:00:40 EST
Seems resolved with WS 6.0. Thanks.
Comment 15 Eli Collins 2007-11-30 11:57:47 EST
Is being resolved in an ESX 3 update as well.
Comment 16 risepp 2009-03-22 02:11:42 EDT
Hello, guys.
I use workstation 6.5, install the redhat AS 4 in it, and meet the same problem.
I change the /usr/src/linux/drivers/message/fusion/mptbase.c to fix this bug:

--- drivers/message/fusion/mptbase.orig.c 2007-07-20 18:47:21.000000000 +0000  
+++ drivers/message/fusion/mptbase.c 2007-07-20 11:23:32.000000000 +0000  
@@ -2564,6 +2564,10 @@  
pfacts->IOCStatus = le16_to_cpu(pfacts->IOCStatus);  
pfacts->IOCLogInfo = le32_to_cpu(pfacts->IOCLogInfo);  
pfacts->MaxDevices = le16_to_cpu(pfacts->MaxDevices);  

/* Fix VMware bug */  
if(pfacts->MaxDevices == 0) {  
pfacts->MaxDevices = 16;  
}

  
pfacts->PortSCSIID = le16_to_cpu(pfacts->PortSCSIID);  
pfacts->ProtocolFlags = le16_to_cpu(pfacts->ProtocolFlags);  
pfacts->MaxPostedCmdBuffers = le16_to_cpu(pfacts->MaxPostedCmdBuffers); 

and re-compile the kernel.

but it doesn't work. 
and i use the work-around solution as description in the link
http://theether.net/kb/100038

change the type from LSILogic to the BusLogic. the problem still exists.

do i do something wrong? please help me. thanks.

Note You need to log in before you can comment on or make changes to this bug.