Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 480158 - RHEL 4.8 mpt driver fails to bring up device
RHEL 4.8 mpt driver fails to bring up device
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel (Show other bugs)
4.8
All Linux
high Severity medium
: rc
: 4.8
Assigned To: Tomas Henzl
Martin Jenner
: Regression
Depends On:
Blocks: 445361
  Show dependency treegraph
 
Reported: 2009-01-15 09:25 EST by Vivek Goyal
Modified: 2009-09-03 10:11 EDT (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-05-18 15:25:18 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
disable msi (272 bytes, patch)
2009-01-16 09:41 EST, Tomas Henzl
no flags Details | Diff
The patch is to fix an issue of incorrectly setting DMA mask for 106XE controllers (7.82 KB, patch)
2009-01-20 02:47 EST, Sathya Prakash
no flags Details | Diff


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2009:1024 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 4.8 kernel security and bug fix update 2009-05-18 10:57:26 EDT

  None (edit)
Description Vivek Goyal 2009-01-15 09:25:09 EST
Description of problem:
mpt driver fails to bring up the device and fails in mptbase.

RHTS logs link

http://rhts.redhat.com/cgi-bin/rhts/test_list.cgi?test_filter=/distribution/kernelinstall&result=Fail&rwhiteboard=kernel%202.6.9-78.30.EL.vgoyal.test4&arch=x86_64&jobids=41921


Loading mptbase.ko module
Fusion MPT base driver 3.12.29.00rh
Copyright (c) 1999-2008 LSI Corporation
Loading mptscsi.ko module
Loading mptspi.ko module
Fusion MPT SPI Host driver 3.12.29.00rh
Loading mptsas.ko module
Fusion MPT SAS Host driver 3.12.29.00rh
ACPI: PCI Interrupt 0000:06:00.0[A] -> GSI 35 (level, low) -> IRQ 185
ACPI: PCI Interrupt 0000:06:00.0[A] -> GSI 35 (level, low) -> IRQ 185
mptbase: Initiating ioc0 bringup
ioc0: SAS1064E: Capabilities={Initiator}
mptbase: mpt_reply: WARNING - ioc0: Invalid cb_idx (0)!
mptbase: mpt_reply: WARNING - ioc0: Invalid cb_idx (0)!
mptbase: Initiating ioc0 recovery
mptbase: mpt_reply: WARNING - ioc0: Invalid cb_idx (0)!
mptbase: mpt_reply: WARNING - ioc0: Invalid cb_idx (0)!
mptbase: Initiating ioc0 recovery
mptbase: mpt_reply: WARNING - ioc0: Invalid cb_idx (0)!
mptbase: mpt_reply: WARNING - ioc0: Invalid cb_idx (0)!
mptbase: Initiating ioc0 recovery
mptbase: mpt_reply: WARNING - ioc0: Invalid cb_idx (0)!
mptbase: mpt_reply: WARNING - ioc0: Invalid cb_idx (0)!
mptbase: Initiating ioc0 recovery
mptbase: mpt_reply: WARNING - ioc0: Invalid cb_idx (0)!
mptbase: mpt_reply: WARNING - ioc0: Invalid cb_idx (0)!
mptbase: Initiating ioc0 recovery
mptbase: mpt_reply: WARNING - ioc0: Invalid cb_idx (0)!
mptbase: mpt_reply: WARNING - ioc0: Invalid cb_idx (0)!
mptbase: Initiating ioc0 recovery
mptbase: mpt_reply: WARNING - ioc0: Invalid cb_idx (0)!
scsi0 : ioc0: LSISAS1064E, FwRev=010a0000h, Ports=1, MaxQ=268, IRQ=185
mptbase: mpt_reply: WARNING - ioc0: Invalid cb_idx (0)!
mptscsi: ioc0: attempting task abort! (sc=000001025f465040)
scsi0 : destination target 0, lun 0
        command = Inquiry 00 00 00 24 00 
mptbase: mpt_reply: WARNING - ioc0: Invalid cb_idx (0)!

Version-Release number of selected component (if applicable):

Happened on test kernel 2.6.9-78.30.EL.vgoyal.test4. This is pramrily some patches on top of 29.EL. I suspect it happened because of mpt patches which went in previous versions.

How reproducible:
Saw once.


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:
Comment 2 Tomas Henzl 2009-01-16 09:24:07 EST
In RHEL5 we had also a issue with mpt - Bug 474465. Maybe it is a similar problem with msi enabled by default - this is new in mpt 3.12.29.
Comment 3 Tomas Henzl 2009-01-16 09:41:22 EST
Created attachment 329209 [details]
disable msi

This patch disables msi, I think it's worth - we already have had similar problems in rh5.
Comment 4 Tomas Henzl 2009-01-16 09:47:53 EST
(In reply to comment #3)
> This patch disables msi, I think it's worth - we already have had similar
> problems in rh5.
I only wanted to say that it should be tested, not that it is worth itself, my English is bad, I'll better stop explaining :)
Comment 5 Tomas Henzl 2009-01-16 12:02:49 EST
I've just tested the msi disable patch without success. I have only found that disabling the whole patch linux-2.6.9-mptfusion-update-mpt-fusion-to-version-3.12.29.00rh.patch makes the box work again.

Vivek,
should we continue here or reopen bz452163 ?

Rob,
the machine is yours again.
Comment 7 Sathya Prakash 2009-01-19 10:09:30 EST
Anyone point me to the link where I can download the kernel with the driver. I will locally try to reproduce and look in further.
Thanks
Sathya
Comment 8 Vivek Goyal 2009-01-19 10:27:26 EST
(In reply to comment #7)
> Anyone point me to the link where I can download the kernel with the driver. I
> will locally try to reproduce and look in further.

http://people.redhat.com/vgoyal/rhel4/
Comment 9 Tomas Henzl 2009-01-19 12:11:24 EST
It looks to me that the problem is in mpt_attach. The change to using mpt_mapresources instead of dealing with resources in mpt_attach looks suspicious. We are now calling pci_enable_device(pdev) in mpt_attach and then in mpt_mapresources, maybe the patch to mpt_mapresources was somehow inaccurate.
Comment 10 Rob Evers 2009-01-19 13:55:07 EST
This patch was tested successfully on a system with the following hba:

[root@dl585-03 ~]# lspci | grep -i lsi
07:09.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1068 PCI-X
Fusion-MPT SAS (rev 01)

With this patch the system hangs (as described above) when tested on a system
with the following hba:

[root@amd-shanghai-01 ~]# lspci | grep -i lsi
06:00.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1064ET
PCI-Express Fusion-MPT SAS (rev 02)
[root@amd-shanghai-01 ~]#

The problem occurs consistently when observed.

Requested LSI reproduce this problem.

(Shifting to use this bug report to track this problem from the original patch request, bz452163.)
Comment 11 Sathya Prakash 2009-01-20 00:06:25 EST
It looks like an issue we capture internally. Is it occuring only with systems having greater than 4GB RAM and only with PCI-E cards?

The internal defect description is as below.

"When the 3.12.29.XX driver is loaded in systems having 4GB or greater, the system became unresponsive.   The reason for this problem is, the 106E B1 and older chip have errata in the device driver where the driver forces all data transfers to be in less than 4GB physical addressing space.  The bug is due to requesting for 64 bit address in the driver for 106E B1 chip and assuming them as 32bit addresses and putting them in 32-Bit scatter gather list.  By doing this the upper 32 bit address was lost.  So the DMA is actually occurring to the incorrect physical location.  That is resulted in infinite IOC recoveries. The issue is seeded when power management support is added in the driver. The fix is to request for 32bit addresses instead of 64bit addresses"

I will provide a test patch soon.

Thanks
Sathya
Comment 12 Sathya Prakash 2009-01-20 02:47:37 EST
Created attachment 329437 [details]
The patch is to fix an issue of incorrectly setting DMA mask for 106XE controllers 

The patch which contains a fix as described in earlier comment.
Comment 13 Tomas Henzl 2009-01-20 07:43:11 EST
Sathya,
thanks, I can confirm that the patch resolves this issue.
Comment 14 RHEL Product and Program Management 2009-01-20 07:50:43 EST
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 15 Tomas Henzl 2009-01-20 11:01:49 EST
I'm posting the patch on internal list.
Comment 17 Vivek Goyal 2009-01-26 10:11:05 EST
Committed in 80.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/
Comment 19 Chris Ward 2009-03-27 10:20:41 EDT
~~ Attention Partners! Snap 1 Released ~~
RHEL 4.8 Snapshot 1 has been released on partners.redhat.com. There should
be a fix present, which addresses this bug. NOTE: there is only a short time
left to test, please test and report back results on this bug
at your earliest convenience.

If you encounter any issues, please set the bug back to the ASSIGNED state and
describe the issues you encountered. If you have found a NEW bug, clone this
bug and describe the issues you encountered. Further questions can be
directed to your Red Hat Partner Manager.

If you have VERIFIED the bug fix. Please select your PartnerID from the
Verified field above. Please leave a comment with your test results details.
Include which arches tested, package version and any applicable logs.

 - Red Hat QE Partner Management
Comment 20 Chris Ward 2009-04-16 09:31:25 EDT
Verified that the patch LSI confirmed is included in -88.EL.
Comment 22 errata-xmlrpc 2009-05-18 15:25:18 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-1024.html

Note You need to log in before you can comment on or make changes to this bug.