Bug 480158
| Summary: | RHEL 4.8 mpt driver fails to bring up device | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 4 | Reporter: | Vivek Goyal <vgoyal> | ||||||
| Component: | kernel | Assignee: | Tomas Henzl <thenzl> | ||||||
| Status: | CLOSED ERRATA | QA Contact: | Martin Jenner <mjenner> | ||||||
| Severity: | medium | Docs Contact: | |||||||
| Priority: | high | ||||||||
| Version: | 4.8 | CC: | andriusb, coughlan, cward, jburke, jtluka, revers, sathya.prakash, thenzl | ||||||
| Target Milestone: | rc | Keywords: | Regression | ||||||
| Target Release: | 4.8 | ||||||||
| Hardware: | All | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2009-05-18 19:25:18 UTC | Type: | --- | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Bug Depends On: | |||||||||
| Bug Blocks: | 445361 | ||||||||
| Attachments: |
|
||||||||
|
Description
Vivek Goyal
2009-01-15 14:25:09 UTC
In RHEL5 we had also a issue with mpt - Bug 474465. Maybe it is a similar problem with msi enabled by default - this is new in mpt 3.12.29. Created attachment 329209 [details]
disable msi
This patch disables msi, I think it's worth - we already have had similar problems in rh5.
(In reply to comment #3) > This patch disables msi, I think it's worth - we already have had similar > problems in rh5. I only wanted to say that it should be tested, not that it is worth itself, my English is bad, I'll better stop explaining :) I've just tested the msi disable patch without success. I have only found that disabling the whole patch linux-2.6.9-mptfusion-update-mpt-fusion-to-version-3.12.29.00rh.patch makes the box work again. Vivek, should we continue here or reopen bz452163 ? Rob, the machine is yours again. Another instance of failure. http://rhts.redhat.com/cgi-bin/rhts/test_list.cgi?test_filter=/distribution/install&result=Fail&rwhiteboard=kernel%202.6.9-79.EL%20smp&arch=x86_64&jobids=42262 Anyone point me to the link where I can download the kernel with the driver. I will locally try to reproduce and look in further. Thanks Sathya (In reply to comment #7) > Anyone point me to the link where I can download the kernel with the driver. I > will locally try to reproduce and look in further. http://people.redhat.com/vgoyal/rhel4/ It looks to me that the problem is in mpt_attach. The change to using mpt_mapresources instead of dealing with resources in mpt_attach looks suspicious. We are now calling pci_enable_device(pdev) in mpt_attach and then in mpt_mapresources, maybe the patch to mpt_mapresources was somehow inaccurate. This patch was tested successfully on a system with the following hba: [root@dl585-03 ~]# lspci | grep -i lsi 07:09.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1068 PCI-X Fusion-MPT SAS (rev 01) With this patch the system hangs (as described above) when tested on a system with the following hba: [root@amd-shanghai-01 ~]# lspci | grep -i lsi 06:00.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1064ET PCI-Express Fusion-MPT SAS (rev 02) [root@amd-shanghai-01 ~]# The problem occurs consistently when observed. Requested LSI reproduce this problem. (Shifting to use this bug report to track this problem from the original patch request, bz452163.) It looks like an issue we capture internally. Is it occuring only with systems having greater than 4GB RAM and only with PCI-E cards? The internal defect description is as below. "When the 3.12.29.XX driver is loaded in systems having 4GB or greater, the system became unresponsive. The reason for this problem is, the 106E B1 and older chip have errata in the device driver where the driver forces all data transfers to be in less than 4GB physical addressing space. The bug is due to requesting for 64 bit address in the driver for 106E B1 chip and assuming them as 32bit addresses and putting them in 32-Bit scatter gather list. By doing this the upper 32 bit address was lost. So the DMA is actually occurring to the incorrect physical location. That is resulted in infinite IOC recoveries. The issue is seeded when power management support is added in the driver. The fix is to request for 32bit addresses instead of 64bit addresses" I will provide a test patch soon. Thanks Sathya Created attachment 329437 [details]
The patch is to fix an issue of incorrectly setting DMA mask for 106XE controllers
The patch which contains a fix as described in earlier comment.
Sathya, thanks, I can confirm that the patch resolves this issue. This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. I'm posting the patch on internal list. Committed in 80.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/ ~~ Attention Partners! Snap 1 Released ~~ RHEL 4.8 Snapshot 1 has been released on partners.redhat.com. There should be a fix present, which addresses this bug. NOTE: there is only a short time left to test, please test and report back results on this bug at your earliest convenience. If you encounter any issues, please set the bug back to the ASSIGNED state and describe the issues you encountered. If you have found a NEW bug, clone this bug and describe the issues you encountered. Further questions can be directed to your Red Hat Partner Manager. If you have VERIFIED the bug fix. Please select your PartnerID from the Verified field above. Please leave a comment with your test results details. Include which arches tested, package version and any applicable logs. - Red Hat QE Partner Management Verified that the patch LSI confirmed is included in -88.EL. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-1024.html |