Bug 494120
| Summary: | XEN NMI detection fails on Dell 1950 server | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 5 | Reporter: | kerdosa | ||||||||
| Component: | kernel-xen | Assignee: | Miroslav Rezanina <mrezanin> | ||||||||
| Status: | CLOSED ERRATA | QA Contact: | Red Hat Kernel QE team <kernel-qe> | ||||||||
| Severity: | urgent | Docs Contact: | |||||||||
| Priority: | low | ||||||||||
| Version: | 5.3 | CC: | clalance, dzickus, emcnabb, jburke, kerdosa, tom, xen-maint | ||||||||
| Target Milestone: | rc | ||||||||||
| Target Release: | --- | ||||||||||
| Hardware: | i686 | ||||||||||
| OS: | Linux | ||||||||||
| Whiteboard: | |||||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||||
| Doc Text: | Story Points: | --- | |||||||||
| Clone Of: | Environment: | ||||||||||
| Last Closed: | 2010-03-30 07:45:00 UTC | Type: | --- | ||||||||
| Regression: | --- | Mount Type: | --- | ||||||||
| Documentation: | --- | CRM: | |||||||||
| Verified Versions: | Category: | --- | |||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||
| Embargoed: | |||||||||||
| Bug Depends On: | |||||||||||
| Bug Blocks: | 526775 | ||||||||||
| Attachments: |
|
||||||||||
|
Description
kerdosa
2009-04-04 17:17:45 UTC
Some of description above is wrong. The check_nmi_watchdog() fails in i386 and X86_64. Only XEN-3.3.1 from xensource is working OK. This is a very critical function to debug system deadlock. Since NMI is not working, our options are very limited to debug deadlock. Can you try to add "watchdog=1 apic_verbosity=debug" to the hypervisor command-line, and give the full output of xm dmesg after you've booted? It might give us a clue as to where the APIC NMI programming is going wrong, since the code in check_nmi_watchdog() is exactly the same in RHEL and in upstream Xen 3.3. Thanks, Chris Lalancette Created attachment 342018 [details]
xm dmesg when successful
Created attachment 342019 [details]
xm dmesg when failed case
Hi, I attached xm dmesg for both success and failed cases. The check_nmi_watchdog() is same, but many apic (or acpi) source code are different between two source trees. Thanks (In reply to comment #6) > Hi, > > I attached xm dmesg for both success and failed cases. The check_nmi_watchdog() > is same, but many apic (or acpi) source code are different between two source > trees. That's actually not true either, I looked through that code and the apic code between upstream Xen and RHEL-5 Xen is more-or-less the same too. So something else is going on, I'll have to look at logs to see if it tells us anything. Chris Lalancette Created attachment 350118 [details]
xm dmesg when failing on SuperMicro X7DBi+
Same problem here running Xen version 3.1.2-128.1.14.el5 on a SuperMicro X7DBi+ board.
The server started rebooting in a random fashion, that's why I'm experimenting with the watchdog option.
I've uploaded a test kernel that should have a fix for this problem here: http://people.redhat.com/clalance/virttest/ Can the reporters who are having problems please download and try out this test kernel? Thanks, Chris Lalancette in kernel-2.6.18-169.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5 Please do NOT transition this bugzilla state to VERIFIED until our QE team has sent specific instructions indicating when to do so. However feel free to provide a comment indicating that this fix has been verified. I've reproduced on -164.el5 and verified on -190.el5xen, saw these results, [root@dell-pe1950-06 ~]# uname -rm; xm dmesg | grep -i watchdog 2.6.18-164.el5xen i686 (XEN) Command line: com2=115200n8 watchdog=1 (XEN) Testing NMI watchdog --- CPU#0 stuck. CPU#1 stuck. CPU#2 stuck. CPU#3 stuck. [root@dell-pe1950-06 ~]# uname -rm; xm dmesg | grep -i watchdog 2.6.18-190.el5xen i686 (XEN) Command line: com2=115200n8 watchdog=1 (XEN) Testing NMI watchdog --- CPU#0 okay. CPU#1 okay. CPU#2 okay. CPU#3 okay. Also I have checked x86_64, [root@dell-pe1950-06 ~]# uname -rm; xm dmesg | grep -i watchdog 2.6.18-164.el5xen x86_64 (XEN) Command line: com1=115200n8 watchdog=1 (XEN) Testing NMI watchdog --- CPU#0 stuck. CPU#1 stuck. CPU#2 stuck. CPU#3 stuck. [root@dell-pe1950-06 ~]# uname -rm; xm dmesg | grep -i watchdog 2.6.18-190.el5xen x86_64 (XEN) Command line: watchdog=1 (XEN) Testing NMI watchdog --- CPU#0 okay. CPU#1 okay. CPU#2 okay. CPU#3 okay. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2010-0178.html |