- Description of problem: Since Fedora FC-7/rawhide moved past 2.6.19, the LSI Logic Fusion MPT SPI Host driver has stopped working completely: the symptoms are that it simply fails to detect any attached disks. Nothing suspicious MPT-related shows up in the dmesg (see attached dmesg, there are ata error messages but i's unclear if this has anything to do with this): it detects the scsi controller, but no disks. - Version-Release number of selected component (if applicable): The problem is known to occur with kernel 2.6.20-1.2953.fc7 (MPT driver 3.04.04). OTOH, the driver works fine with latest FC-6 (2.6.19-1.2911.fc6, MPT driver 3.04.02) and RHEL5-RC2 (2.6.18-8.el5, MPT driver 3.04.02). I was also unable to reproduce the problem with the same fc7 kernel, but on a SunFire X4200 (mptsas.ko, x86_64), so it seems to be affecting mptspi.ko specifically (or is i386 only). - How reproducible: Always. - Steps to Reproduce: The best way to reproduce is using VMWare Workstation, which uses MPT SPI for its SCSI interface (1000:0030 LSI Logic / Symbios Logic 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI). It will not detect any disks, making it impossible to boot any VM on a SCSI disk. The workaround is to configure the VM disk as an IDE disk. Once FC7 booted on the IDE disk, any extra SCSI discs attached are not detected..
Created attachment 149105 [details] Kernel dmesg with MPT logs
same issue here - VMware workstation 5.5.3 based lsilogic. Let me know if I can provide any useful info.
So I hacked up my 2.6.20 initrd to pause for 5 seconds after loading mptspi and to the cat /proc/interrupts every second for 5 more seconds. Interrupt 16 had fired 14 times during the 5 second pause but over the next 5 seconds it did not fire at all any more. **** 2.6.20 **** Loading mptbase.ko module Fusion MPT base driver 3.04.04 Copyright (c) 1999-2007 LSI Logic Corporation Loading mptscsih.ko module Loading mptspi.ko module Fusion MPT SPI Host driver 3.04.04 ACPI: PCI Interrupt 0000:00:10.0[A] -> GSI 17 (level, low) -> IRQ 16 mptbase: Initiating ioc0 bringup ioc0: 53C1030: Capabilities={Initiator} scsi0 : ioc0: LSI53C1030, FwRev=00000000h, Ports=1, MaxQ=128, IRQ=16 Sleeping 5 seconds CPU0 CPU1 0: 717 0 IO-APIC-edge timer 1: 7 2 IO-APIC-edge i8042 4: 24 57 IO-APIC-edge serial 8: 0 0 IO-APIC-edge rtc 9: 0 0 IO-APIC-fasteoi acpi 12: 106 0 IO-APIC-edge i8042 16: 0 14 IO-APIC-fasteoi ioc0 NMI: 0 0 LOC: 3061 3511 ERR: 0 MIS: 0 **** 2.6.19 ***** Fusion MPT base driver 3.04.03 Copyright (c) 1999-2007 LSI Logic Corporation input: ImPS/2 Generic Wheel Mouse as /class/input/input1 Fusion MPT SPI Host driver 3.04.03 ACPI: PCI Interrupt 0000:00:10.0[A] -> GSI 17 (level, low) -> IRQ 16 mptbase: Initiating ioc0 bringup ioc0: 53C1030: Capabilities={Initiator} scsi0 : ioc0: LSI53C1030, FwRev=00000000h, Ports=1, MaxQ=128, IRQ=16 scsi 0:0:0:0: Direct-Access VMware, VMware Virtual S 1.0 PQ: 0 ANSI: 2 target0:0:0: Beginning Domain Validation target0:0:0: Domain Validation skipping write tests target0:0:0: Ending Domain Validation target0:0:0: FAST-40 WIDE SCSI 80.0 MB/s ST (25 ns, offset 127) device-mapper: ioctl: 4.11.0-ioctl (2006-10-12) initialised: dm-devel SCSI device sda: 33554432 512-byte hdwr sectors (17180 MB) sda: Write Protect is off sda: Mode Sense: 5d 00 00 00 sda: cache data unavailable sda: assuming drive cache: write through SCSI device sda: 33554432 512-byte hdwr sectors (17180 MB) sda: Write Protect is off sda: Mode Sense: 5d 00 00 00 sda: cache data unavailable sda: assuming drive cache: write through sda: sda1 sda2 sd 0:0:0:0: Attached scsi disk sda
I believe this issue only seen in a VMWare environment, is that correct? The log back at comment #1 indicates no devices; so there's nothing else to decipher from the log. Just some info on the versions: The 3.04.04 driver was added after 2.6.20 kernel released. Meaning you will find it in 2.6.21-rc3. I didn't check whether it was ported back to the stable version branch, such as 3.6.20.3. Is that what your on? The 3.04.03 driver is what you will find in 2.6.20 stock kernel. In reviewing the changes in that patch, there is nothing that strikes me that would cause this problem in mptspi.c. Mostly I was reorganizing the code, there was some debug printks added, and I added support for greater than 255 lun and targets. Its possible that the "greater than 255" target support is the problem. If you look in mptspi_probe, I changed the max_id from MPT_MAX_SCSI_DEVICES to ioc->devices_per_bus. This value was calculated from the port->PortSCSIID. This is obtained from some configuration pages called port facts, returned from firmwawre. Perhaps VMWare, which is emulating the firmware, is not setting this. Can you insert a printk in mptspi_probe, to see what the max_id is set to?
I'll get that information on Monday when I get back to my office. Thanks.
Eric, I confirm that your intuition is true. max_id is indeed zero. If I change it to MPT_MAX_SCSI_DEVICES, devices are probed correctly. Below are the Port Facts returned by the VMWare firmware. PortSCSIID is equal to 7, but MaxDevices is 0, and that's what ioc->devices_per_bus is computed from (is this computation correct with respect to the semantic of the PortFacts values ?). MsgLength: 10 Function: 5 PortNumber: 0 MsgFlags: 0 MsgContext: 0 IOCStatus: 0 IOCLogInfo: 0 PortType: 1 MaxDevices: 0 PortSCSIID: 7 ProtocolFlags: 8 MaxPostedCmdBuffer: 0 MaxPersistentIDs: 0 MaxLanBuckets: 0 MaxInitiators: 0
So this looks like an interaction between vmware and the latest mpt fusion driver. Adding Ed. Please refer this to the appropriate person at vmware.
Tom, thanks for ping on this problem. Eric, which Red Hat and Novell release streams did this change go into?
This is not going to impact any current rhel/sles release streams, except for RHEL5.1 The 3.04.04 driver is only in the 2.6.21-rc bits I need to post a patch to lsml@ with the fix, hoping to handle this over the weekend. There is bugzilla 225177, which is FATE for RHEL5.1 , that driver will have this problem. I will need to append this patch to the bugzilla. On other note, can vmware fix the emulated 1030 so it fills the port facts properly. According to comment #6, the MaxDevices is set zero, when it should be 16.
This was caught and fixed (to set Maxdevices to 16) internally while testing with FC-7 sometime earlier this month.
Ed, (perhaps slightly off-topic here, but not enough to keep me from posting :o)) Since it would appear that only VMs under VMware are bitten by this bug, can you provide any information on whether this will be addressed (fixed) in some way, shape or form? And perhaps a timeframe for such a fix? Thanks :)
Eric, what would the LSI driver-side patch look like ? Test for MaxDevices==0 and set it to something else ? I'd like to try and push this patch on the Fedora kernel side, hence my question. Thanks :-)
(In reply to comment #9) > I need to post a > patch to lsml@ with the fix, hoping to handle this over the weekend. Denis, is this all fixed now? Please close if so. Thanks.
Seems resolved with WS 6.0. Thanks.
Is being resolved in an ESX 3 update as well.
Hello, guys. I use workstation 6.5, install the redhat AS 4 in it, and meet the same problem. I change the /usr/src/linux/drivers/message/fusion/mptbase.c to fix this bug: --- drivers/message/fusion/mptbase.orig.c 2007-07-20 18:47:21.000000000 +0000 +++ drivers/message/fusion/mptbase.c 2007-07-20 11:23:32.000000000 +0000 @@ -2564,6 +2564,10 @@ pfacts->IOCStatus = le16_to_cpu(pfacts->IOCStatus); pfacts->IOCLogInfo = le32_to_cpu(pfacts->IOCLogInfo); pfacts->MaxDevices = le16_to_cpu(pfacts->MaxDevices); /* Fix VMware bug */ if(pfacts->MaxDevices == 0) { pfacts->MaxDevices = 16; } pfacts->PortSCSIID = le16_to_cpu(pfacts->PortSCSIID); pfacts->ProtocolFlags = le16_to_cpu(pfacts->ProtocolFlags); pfacts->MaxPostedCmdBuffers = le16_to_cpu(pfacts->MaxPostedCmdBuffers); and re-compile the kernel. but it doesn't work. and i use the work-around solution as description in the link http://theether.net/kb/100038 change the type from LSILogic to the BusLogic. the problem still exists. do i do something wrong? please help me. thanks.