Bug 456638
| Summary: | [Kdump] not work on HP-XW8600 | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 5 | Reporter: | Qian Cai <qcai> | ||||||||||||||
| Component: | kernel | Assignee: | Prarit Bhargava <prarit> | ||||||||||||||
| Status: | CLOSED ERRATA | QA Contact: | Martin Jenner <mjenner> | ||||||||||||||
| Severity: | medium | Docs Contact: | |||||||||||||||
| Priority: | medium | ||||||||||||||||
| Version: | 5.2 | CC: | cwyse, dzickus, jeff.burrell, lwang, mgahagan, nhorman, rbinkhor, syeghiay, tao | ||||||||||||||
| Target Milestone: | rc | ||||||||||||||||
| Target Release: | --- | ||||||||||||||||
| Hardware: | i686 | ||||||||||||||||
| OS: | Linux | ||||||||||||||||
| Whiteboard: | |||||||||||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||||||||||
| Doc Text: | Story Points: | --- | |||||||||||||||
| Clone Of: | Environment: | ||||||||||||||||
| Last Closed: | 2009-01-20 20:18:40 UTC | Type: | --- | ||||||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||||||
| Documentation: | --- | CRM: | |||||||||||||||
| Verified Versions: | Category: | --- | |||||||||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||
| Embargoed: | |||||||||||||||||
| Attachments: |
|
||||||||||||||||
|
Description
Qian Cai
2008-07-25 07:55:54 UTC
Created attachment 312629 [details]
dmidecode output
Created attachment 312630 [details]
full log of endless ACPI errors from the second Kernel
Created attachment 312631 [details]
full log of the second Kernel hung
This might be an BIOS issue. Some details of investigation so far. - These systems have got GPE block and BIOS reports 2 addresses. One seems to be 32bit address reachable from RSDT and other seems to be 64bit reachable from XSDT. 32bit version reports 0000F820 and 64bit version reports 000000000001F028. Upon doing some printk(), I found out that we default to using 64bit version both in first kernel and second kernel. On a 32bit machine this address is beyond IO port range of 0xffff. I believe that could be the reason that when we try to disable the events in second kernel it effectively does not get disabled because port address is beyond reach. And for some reason in second kernel all the GPEs have fired up. So the question to HP. - Why there are two different addresses reported? Is it a bug or it signifies something. - Second address is beyond IO port range and we can't disable events in second kernel even upon receiving a flood of events. Is there a way to avoid this issue. I think forcing the use of RSDT will make use of address F820 and that might fix the issue. Thanks Vivek Vivek, I'll get some answers from the BIOS engineers responsible for the xw8600 and get back to you. In the meantime, the output from dmidecode indicates you're using a fairly old version of the system bios(1.18). In the HP workstations IssueTracker #118848 there is the latest version (1.29) available. On the very small chance that something changed with regard to the GPE block, you should probably retry after flashing the system to 1.29. Hopefully I'll have some response shortly to share with you. Jeff *** Bug 458322 has been marked as a duplicate of this bug. *** The problem appears to be that the address of the GPE0 register bank is reported incorrectly the HP xw BIOS. From private email: I can *prove* that there are two GPE0 addresses being reported. I downloaded the acpica-unix package from Intel, and the pmtools package from Fedora. I installed them both and did the following: acpidump (to see if acpidump was working) acpidump -b -t FACP -o FADT.aml (dumps the first FACP table) iasl -d FADT.aml This creates a file, FADT.dsl, which is human readable dump of the FADT table. From this file we see: [050h 080 4] GPE0 Block Address : 0000F828 The address for GPE0 registers is 0xf828 ... and ... [0DCh 220 12] GPE0 Block : <Generic Address Structure> [0DCh 220 1] Space ID : 01 (SystemIO) [0DDh 221 1] Bit Width : 20 [0DEh 222 1] Bit Offset : 00 [0DFh 223 1] Access Width : 00 [0E0h 224 8] Address : 000000000001F030 This other structure reports that the address is 0x1f030. Obviously ;) , 0x1f030 != 0xf828. Our BIOS guys took a look yesterday and agree that what Prarit/Vivek have found is definitely a bug. Interestingly, this bug is in a common part of the BIOS HP workstations share with the business desktops and has been in the tree for years, apparently without much consequence(until now :-), of course). Since we're heavily into our Nehalem/Tylersburg development, it will take me a little while to get a test BIOS for either the xw6600 or xw8600. I could, however, give you a test BIOS for the xw6800(Tylersburg) proto's you have if you wanted to test out the fix. Let me know if you want to try that. In the meantime, I'll see how quickly I can get something for the xw8600. For any engineers who wish to take Jeff up on his offer of a test BIOS for the xw6800, please contact me for access to one of the prototypes. Hi Jeff, Could we get a test BIOS for the xw6800? I'll try to get you a list of problematic xw platforms, P. Prarit, I thought I might be able to give you what I tested with here, which eliminates the GPE errors we've been seeing. Unfortunately that test BIOS is the current top-of-tree that has a bunch of other things that would destabilize your prototypes. The BIOS guys are planning a formal release late this week after which we can add this change back in for you to test with. I am expecting I should have a version I can give you sometime next week. Sorry for the delay... Jeff *** Bug 454998 has been marked as a duplicate of this bug. *** *** Bug 454996 has been marked as a duplicate of this bug. *** *** Bug 454974 has been marked as a duplicate of this bug. *** Is there a bios patch for xw9300 or xw9400s? I am seeing a kdump failure on those two types of hardware. It can be fixed by using the noapic option. But I would be willing to test a new bios rev is possible. No, currently there is no BIOS update available which fixes the bug identified by Prarit/Vivek for any of the shipping HP workstations. The only BIOS patched so far is for the prototype systems xw4800/xw6800/xw8800. The exact same problem exists in all HP workstations(and HP business desktops as well) because they have the same root BIOS tree in which this defect exists, so that change will propagate but it will do so slowly as all of those shipping platforms are nearing the end of their life. Let me see if I can manage to get someone to make a special test version of the xw9400 BIOS for testing... Jeff Everyone, Jeff Burke (jburke) recently reported seeing this behavior on a xw8600 in RHTS. I'm therefore proposing a WAR for this issue. I will attach the WAR patch shortly and will submit the RHKL ASAP. P. It looks like -123.el5 Kernel includes the fix. However, I have still seen those ACPI errors on IA-32 hp-xw6800-02.rhts.bos.redhat.com for both bare-metal and Kdump Kernels. Please see attachments. Created attachment 323304 [details]
ACPI errors from "dmesg" on hp-xw6800-02
Created attachment 323307 [details]
output of "dmidecode" on hp-xw6800-02
in kernel-2.6.18-123.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5 New bug created for hp-xw6800-02.rhts.bos.redhat.com. Bug 471341 - [5.3] ACPI Error (evgpe-0711) on HP xw6800 Testing results for today so far 13 Nov. 2008 2.6.18-123 IA-32 Kernel: [machine] [bare-metal Kdump] [Xen Domian 0 Kernel] hp-xw6800-02 Fail ?? hp-xw8800-01 ?? Fail[1] [1] http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=5127343 An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-0225.html Can someone post the WAR patch that was added to RHEL 5.3? I have a customer who may need apply something similar to FC8 prior to releasing a BIOS fix. This patch should be a reasonably good place to begin. Thanks! Jeff Comment on attachment 319276 [details]
RHEL5 WAR for this issue
Jeff, here is the patch. We later made a change to this to catch *all* "HP xw" systems because of the large number of models that were failing.
|