Bug 500845
Summary: | [RHEL5-U4] Kernel - testing NMI watchdog ... <4>WARNING: CPU#0: NMI appears to be stuck (0->0)! | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Jeff Burke <jburke> | ||||
Component: | kernel-xen | Assignee: | Don Zickus <dzickus> | ||||
Status: | CLOSED DUPLICATE | QA Contact: | Red Hat Kernel QE team <kernel-qe> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | low | ||||||
Version: | 5.4 | CC: | arozansk, bpeck, clalance, dmair, dzickus, gozen, jburke, llim, lwang, mgahagan, mjenner, pbunyan, phan, prarit, qcai, tcamuso, xen-maint | ||||
Target Milestone: | rc | ||||||
Target Release: | 5.5 | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
URL: | http://rhts.redhat.com/testlogs/58610/195983/1633506/boot.messages | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: |
When booting a fully virtualized Xen guest, the following message may be displayed on the guest console:
testing NMI watchdog ... <4>
WARNING: CPU#0: NMI appears to be stuck (0->0)!
This issue is caused by an implementation issue with the Xen hypervisor and can be safely ignored. (BZ#500845)
|
Story Points: | --- | ||||
Clone Of: | Environment: | ||||||
Last Closed: | 2010-11-08 22:33:29 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 513501, 514491 | ||||||
Attachments: |
|
Description
Jeff Burke
2009-05-14 14:00:36 UTC
Yeah, this is unfortunately expected. If I remember properly from my last foray into looking at this, we don't properly emulate the MSR writes to the performance counters. What happens is that the Linux kernel does writes to the MSR performance counters, and then expects an interrupt later on when the performance counters drop to 0. However, the hypervisor more-or-less just drops the writes to the performance counter MSR's on the ground, so a later interrupt is never generated, and then you get the "NMI appears to be stuck" message. I don't know what the current upstream status of this is, since I haven't looked in a while. Chris Lalancette Just as a side note, this problem isn't confined to kernel-xen on AMDs, they are happening all over the place for x86_64 hvm guests . I see them on Intel boxen and on x86_64 kvm guests as well, (In reply to comment #2) > Just as a side note, this problem isn't confined to kernel-xen on AMDs, they > are happening all over the place for x86_64 hvm guests . I see them on Intel > boxen and on x86_64 kvm guests as well, Well the Intel ones are related to bz 500892, which is basically defective chips. The AMD ones maybe defective too, just need to find the errata sheets on it. Gurhan is it possible to test a 5.3 distro on the AMD shanghai machines. If the problem is there then this isn't new and we may have to figure out how to deal with this. Otherwise it is a regression and will need to be fixed. Ok, I submitted jobs to amd-shanghai-0[12] for a rhel5.3 tree . Will let you know of the results. The same thing does happen on 5.3 tree on shanghai box too.. http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=8356616 Created attachment 346251 [details]
Program to detect when running in an HVM guest
Release note added. If any revisions are required, please set the "requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: When booting a fully virtualized Xen or KVM guest, the message "testing NMI watchdog ... <4>WARNING: CPU#0: NMI appears to be stuck (0->0)!" may be displayed on the guest console. This is due to an implementation issue with the Xen and KVM hypervisors, and can be safely ignored. This implementation issue may be addressed in a future RHEL-5 release. Release note updated. If any revisions are required, please set the "requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1 +1,6 @@ -When booting a fully virtualized Xen or KVM guest, the message "testing NMI watchdog ... <4>WARNING: CPU#0: NMI appears to be stuck (0->0)!" may be displayed on the guest console. This is due to an implementation issue with the Xen and KVM hypervisors, and can be safely ignored. This implementation issue may be addressed in a future RHEL-5 release.+When booting a fully virtualized Xen guest, the following message may be displayed on the guest console: + +testing NMI watchdog ... <4> +WARNING: CPU#0: NMI appears to be stuck (0->0)! + +This issue is caused by an implementation issue with the Xen hypervisor and can be safely ignored. (BZ#500845) Status update: After talking with Chris L., implementing perfctr msr emulation in xen and kvm probably will never happen for RHEL-5 as it is to difficult to do. Implementing a check this early in boot to determine if we are on a virtualized guest is difficult to do too. Current recommendation is to workaround it in the scripts and close this as WONT_FIX. Opinions? Cheers, Don Considering I posted a patch for it, might as well own the bug This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. *** This bug has been marked as a duplicate of bug 455323 *** This bug also shows up when running on HyperV Guests. Gurhan's hvmdetect.c doesn't work for these virt machines. I guess I am confused. Isn't HyperV Guests as Microsoft guest? How does a linux kernel message ending up on a Microsoft guest? Cheers, Don To be clear... This is running Regular RHEL distro under HyperV. Similar to how we would run under VMWARE. Ok, yes, there is no code to check whether or not a RHEL guest is running on vmware or hyperV. I don't even know how to check for that. We were able to check cpu strings fro Xen and KVM. I might have to switch checks if this is going to be an issue and instead output a message stating that the nmi watchdog is disabled because the perf counters are not available. Cheers, Don Don, I do not have direct access to VMware or Hyper-V hypervisors, but if google is to be believed, we should be able to check for both of those using similar mechanisms to Xen and KVM. In particular, hypervisors commonly put an easily identifiable string in CPUID leaf 0x40000000, and bare-metal machines leave this blank. Therefore, you should be able to call cpuid, get the output, and check for: VMware - "VMwareVMware" Hyper-V - "Microsoft HV" Xen - "XenVMMXenVMM" KVM - "KVMKVMKVM" (the latter two are already implemented, as you said). All of that being said, we already have a perfectly legitimate test for whether the perf counters are working, and that is the test that causes this message to be printed. The other option here is just to turn that "NMI appears to be stuck" message into a KERN_DEBUG statement, so it is not so obvious. I'll leave it up to you which way you want to go. Chris Lalancette Oh, I forgot to mention: for gory details of detecting the various hypervisors, have a look at virt-what: http://people.redhat.com/~rjones/virt-what/ Chris Lalancette |