Red Hat Bugzilla – Bug 470035
xm dmesg printk spam -- Domain attempted WRMSR 00000000000000e8 from 00000016:3d0e9470 to 00000000:00000000
Last modified: 2014-06-18 03:38:15 EDT
I'm seeing a large number of messages pop up on my dom0 serial console with 2.6.18-121.el5.jtltest.53xen:
(XEN) printk: 7 messages suppressed.
(XEN) traps.c:1761:d0 Domain attempted WRMSR 00000000000000e8 from 00000029:d7ca940f to 00000000:00000000.
(XEN) printk: 1 messages suppressed.
(XEN) traps.c:1761:d0 Domain attempted WRMSR 00000000000000e8 from 0000002a:1b018f99 to 00000000:00000000.
(XEN) printk: 7 messages suppressed.
(XEN) traps.c:1761:d0 Domain attempted WRMSR 00000000000000e8 from 0000002b:7ce56d5c to 00000000:00000000.
(XEN) traps.c:1761:d0 Domain attempted WRMSR 00000000000000e7 from 00000038:b3032339 to 00000000:00000000.
(XEN) traps.c:1761:d0 Domain attempted WRMSR 00000000000000e8 from 0000002b:c04c7db3 to 00000000:00000000.
(XEN) traps.c:1761:d0 Domain attempted WRMSR 00000000000000e7 from 00000039:04aa3d86 to 00000000:00000000.
(XEN) traps.c:1761:d0 Domain attempted WRMSR 00000000000000e8 from 0000002c:24d0e568 to 00000000:00000000.
(XEN) traps.c:1761:d0 Domain attempted WRMSR 00000000000000e7 from 00000039:96d2e77e to 00000000:00000000.
(XEN) printk: 12 messages suppressed.
(XEN) traps.c:1761:d0 Domain attempted WRMSR 00000000000000e8 from 0000002c:83fb2b6e to 00000000:00000000.
...it may have started with earlier kernels -- I'm not sure. I've just noticed it. It's a 2 x dual core CPU box. cpuinfo from one of the cores is below:
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 15
model name : Intel(R) Xeon(R) CPU 5160 @ 3.00GHz
stepping : 11
cpu MHz : 1998.000
cache size : 4096 KB
physical id : 0
siblings : 1
core id : 0
cpu cores : 1
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu tsc msr pae mce cx8 apic mtrr mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm
bogomips : 7484.56
clflush size : 64
cache_alignment : 64
address sizes : 38 bits physical, 48 bits virtual
...other info available upon request. I can also probably provide access to the box if needed.
Hm, is this all of the time, or just on bootup of a dom0 or of a PV guest? It's common for a guest (including dom0) to do this sort of poking around at boot time, but it shouldn't happen after that. If it's happening more often than that, though, something else might be going on.
I see it pop up pretty regularly. My guests have been active for quite a while now and I still see the message pop occasionally (at least a few times every minute).
I don't see these messages on a 5.2 system. Again, I'm not sure if this is
even a problem (it's not an artefact from increased debugging in beta
kernels, is it?), but I don't see this happening before 5.3.
Internal Status set to 'Waiting on SEG'
This event sent from IssueTracker by streeter
Seems that something is different, we are looking into it...
This bugzilla has Keywords: Regression.
Since no regressions are allowed between releases,
it is also being proposed as a blocker for this release.
Please resolve ASAP.
*** Bug 477647 has been marked as a duplicate of this bug. ***
OK. I'm pretty sure I know what this is now, thanks to jlayton allowing me to poke around on his box. For 5.3, we updated the acpi-cpufreq kernel module to user two new Intel MSR's, namely MSR_IA32_APERF and MSR_IA32_MPERF. The problem is, the hypervisor doesn't know anything about these MSR's. So the dom0 is trying to measure the frequency by doing:
And then trying to reset the state of those two MSR's with:
It's the latter which are probably spewing all of the messages. This is probably pretty easily fixed by allowing the dom0 to actually do those wrmsr's, which should be a fairly simple patch to teach the hypervisor about them. I'll try to come up with something.
Created attachment 329207 [details]
Allow dom0 to write to the APERF/MPERF MSR
OK, I tested out the following patch on jlayton's failing box, and it did indeed quiet down the messages. I'll try to float this patch upstream and see what kind of response I get.
FYI; this was committed in upstream Xen as xen-unstable c/s 19055.
I've uploaded a test kernel that contains this fix (along with several others)
to this location:
Could the original reporter try out the test kernels there, and report back if
it fixes the problem?
Booted to the new kernel and no longer see these messages. Looks like the patch works!
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.