Description of problem: mpt driver fails to bring up the device and fails in mptbase. RHTS logs link http://rhts.redhat.com/cgi-bin/rhts/test_list.cgi?test_filter=/distribution/kernelinstall&result=Fail&rwhiteboard=kernel%202.6.9-78.30.EL.vgoyal.test4&arch=x86_64&jobids=41921 Loading mptbase.ko module Fusion MPT base driver 3.12.29.00rh Copyright (c) 1999-2008 LSI Corporation Loading mptscsi.ko module Loading mptspi.ko module Fusion MPT SPI Host driver 3.12.29.00rh Loading mptsas.ko module Fusion MPT SAS Host driver 3.12.29.00rh ACPI: PCI Interrupt 0000:06:00.0[A] -> GSI 35 (level, low) -> IRQ 185 ACPI: PCI Interrupt 0000:06:00.0[A] -> GSI 35 (level, low) -> IRQ 185 mptbase: Initiating ioc0 bringup ioc0: SAS1064E: Capabilities={Initiator} mptbase: mpt_reply: WARNING - ioc0: Invalid cb_idx (0)! mptbase: mpt_reply: WARNING - ioc0: Invalid cb_idx (0)! mptbase: Initiating ioc0 recovery mptbase: mpt_reply: WARNING - ioc0: Invalid cb_idx (0)! mptbase: mpt_reply: WARNING - ioc0: Invalid cb_idx (0)! mptbase: Initiating ioc0 recovery mptbase: mpt_reply: WARNING - ioc0: Invalid cb_idx (0)! mptbase: mpt_reply: WARNING - ioc0: Invalid cb_idx (0)! mptbase: Initiating ioc0 recovery mptbase: mpt_reply: WARNING - ioc0: Invalid cb_idx (0)! mptbase: mpt_reply: WARNING - ioc0: Invalid cb_idx (0)! mptbase: Initiating ioc0 recovery mptbase: mpt_reply: WARNING - ioc0: Invalid cb_idx (0)! mptbase: mpt_reply: WARNING - ioc0: Invalid cb_idx (0)! mptbase: Initiating ioc0 recovery mptbase: mpt_reply: WARNING - ioc0: Invalid cb_idx (0)! mptbase: mpt_reply: WARNING - ioc0: Invalid cb_idx (0)! mptbase: Initiating ioc0 recovery mptbase: mpt_reply: WARNING - ioc0: Invalid cb_idx (0)! scsi0 : ioc0: LSISAS1064E, FwRev=010a0000h, Ports=1, MaxQ=268, IRQ=185 mptbase: mpt_reply: WARNING - ioc0: Invalid cb_idx (0)! mptscsi: ioc0: attempting task abort! (sc=000001025f465040) scsi0 : destination target 0, lun 0 command = Inquiry 00 00 00 24 00 mptbase: mpt_reply: WARNING - ioc0: Invalid cb_idx (0)! Version-Release number of selected component (if applicable): Happened on test kernel 2.6.9-78.30.EL.vgoyal.test4. This is pramrily some patches on top of 29.EL. I suspect it happened because of mpt patches which went in previous versions. How reproducible: Saw once. Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Noticed once more. http://rhts.redhat.com/cgi-bin/rhts/test_list.cgi?test_filter=/distribution/kernelinstall&result=Fail&rwhiteboard=kernel%202.6.9-78.30.EL.vgoyal.test5&arch=x86_64&jobids=41929
In RHEL5 we had also a issue with mpt - Bug 474465. Maybe it is a similar problem with msi enabled by default - this is new in mpt 3.12.29.
Created attachment 329209 [details] disable msi This patch disables msi, I think it's worth - we already have had similar problems in rh5.
(In reply to comment #3) > This patch disables msi, I think it's worth - we already have had similar > problems in rh5. I only wanted to say that it should be tested, not that it is worth itself, my English is bad, I'll better stop explaining :)
I've just tested the msi disable patch without success. I have only found that disabling the whole patch linux-2.6.9-mptfusion-update-mpt-fusion-to-version-3.12.29.00rh.patch makes the box work again. Vivek, should we continue here or reopen bz452163 ? Rob, the machine is yours again.
Another instance of failure. http://rhts.redhat.com/cgi-bin/rhts/test_list.cgi?test_filter=/distribution/install&result=Fail&rwhiteboard=kernel%202.6.9-79.EL%20smp&arch=x86_64&jobids=42262
Anyone point me to the link where I can download the kernel with the driver. I will locally try to reproduce and look in further. Thanks Sathya
(In reply to comment #7) > Anyone point me to the link where I can download the kernel with the driver. I > will locally try to reproduce and look in further. http://people.redhat.com/vgoyal/rhel4/
It looks to me that the problem is in mpt_attach. The change to using mpt_mapresources instead of dealing with resources in mpt_attach looks suspicious. We are now calling pci_enable_device(pdev) in mpt_attach and then in mpt_mapresources, maybe the patch to mpt_mapresources was somehow inaccurate.
This patch was tested successfully on a system with the following hba: [root@dl585-03 ~]# lspci | grep -i lsi 07:09.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1068 PCI-X Fusion-MPT SAS (rev 01) With this patch the system hangs (as described above) when tested on a system with the following hba: [root@amd-shanghai-01 ~]# lspci | grep -i lsi 06:00.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1064ET PCI-Express Fusion-MPT SAS (rev 02) [root@amd-shanghai-01 ~]# The problem occurs consistently when observed. Requested LSI reproduce this problem. (Shifting to use this bug report to track this problem from the original patch request, bz452163.)
It looks like an issue we capture internally. Is it occuring only with systems having greater than 4GB RAM and only with PCI-E cards? The internal defect description is as below. "When the 3.12.29.XX driver is loaded in systems having 4GB or greater, the system became unresponsive. The reason for this problem is, the 106E B1 and older chip have errata in the device driver where the driver forces all data transfers to be in less than 4GB physical addressing space. The bug is due to requesting for 64 bit address in the driver for 106E B1 chip and assuming them as 32bit addresses and putting them in 32-Bit scatter gather list. By doing this the upper 32 bit address was lost. So the DMA is actually occurring to the incorrect physical location. That is resulted in infinite IOC recoveries. The issue is seeded when power management support is added in the driver. The fix is to request for 32bit addresses instead of 64bit addresses" I will provide a test patch soon. Thanks Sathya
Created attachment 329437 [details] The patch is to fix an issue of incorrectly setting DMA mask for 106XE controllers The patch which contains a fix as described in earlier comment.
Sathya, thanks, I can confirm that the patch resolves this issue.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
I'm posting the patch on internal list.
Committed in 80.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/
~~ Attention Partners! Snap 1 Released ~~ RHEL 4.8 Snapshot 1 has been released on partners.redhat.com. There should be a fix present, which addresses this bug. NOTE: there is only a short time left to test, please test and report back results on this bug at your earliest convenience. If you encounter any issues, please set the bug back to the ASSIGNED state and describe the issues you encountered. If you have found a NEW bug, clone this bug and describe the issues you encountered. Further questions can be directed to your Red Hat Partner Manager. If you have VERIFIED the bug fix. Please select your PartnerID from the Verified field above. Please leave a comment with your test results details. Include which arches tested, package version and any applicable logs. - Red Hat QE Partner Management
Verified that the patch LSI confirmed is included in -88.EL.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-1024.html