Bug 1063836

Summary:	kvm: 23090: cpu0 unhandled wrmsr 0x391 data 2000000f
Product:	Red Hat Enterprise Linux 6	Reporter:	Amador Pahim <asegundo>
Component:	kernel	Assignee:	Radim Krčmář <rkrcmar>
Status:	CLOSED ERRATA	QA Contact:	Virtualization Bugs <virt-bugs>
Severity:	medium	Docs Contact:
Priority:	unspecified
Version:	6.5	CC:	areis, asegundo, bsarathy, chayang, drjones, hhuang, juzhang, michen, mkenneth, qzhang, rbalakri, rkrcmar, virt-maint
Target Milestone:	rc
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:	kernel-2.6.32-460.el6	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2014-10-14 05:56:08 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Amador Pahim 2014-02-11 13:50:31 UTC

Description of problem:
Message shown in Host console when running KVM guests.

Version-Release number of selected component (if applicable):
Kernel 2.6.32-431.3.1.el6.x86_64

How reproducible:
100%

Steps to Reproduce:
1.Run KVM guests on a RHEL x86 hardware.
2.Wait for the messages on host console.

Actual results:
Console harmless and annoying messages:

 kvm: 23090: cpu0 unhandled wrmsr 0x391 data 2000000f 

Expected results:
Supress messges to reduce support load.

Additional info:
This specific "0x391" register is well documented on http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf

 391H 913 MSR_UNCORE_PERF_GLOBAL_CTRL Package See Section 18.7.2.1, “Uncore Performance Monitoring Management Facility.”

Since this is a performance monitoring register, probably this is not implemented by KVM because it is meaningless to have two kernels (host and guest) monitoring the performance of the same processor. And it also could make the monitoring data inconsistent.

The messages are exposing the guest try to access the CPU register not emulated or exposed by kvm. It's harmless and can be ignored. Since it's not KVM module but the core Kernel trying to write to the register, I'm not sure we can someway prevent it:

$ grep -e NHM_UNC_PERF_GLOBAL_CTL -e SNB_UNC_PERF_GLOBAL_CTL -r *
arch/x86/kernel/cpu/perf_event_intel_uncore.c:          wrmsrl(SNB_UNC_PERF_GLOBAL_CTL,
arch/x86/kernel/cpu/perf_event_intel_uncore.c:  wrmsrl(NHM_UNC_PERF_GLOBAL_CTL, 0);
arch/x86/kernel/cpu/perf_event_intel_uncore.c:  wrmsrl(NHM_UNC_PERF_GLOBAL_CTL, NHM_UNC_GLOBAL_CTL_EN_PC_ALL | NHM_UNC_GLOBAL_CTL_EN_FC);
arch/x86/kernel/cpu/perf_event_intel_uncore.h:#define SNB_UNC_PERF_GLOBAL_CTL                 0x391
arch/x86/kernel/cpu/perf_event_intel_uncore.h:#define NHM_UNC_PERF_GLOBAL_CTL                 0x391

Comment 1 Amador Pahim 2014-02-12 11:28:50 UTC

Seem like upstream kernel commit a05123bdd1b9ba961ed262864924a5b3ee81afe8 fixes the issue. I'm running a patched downstream build to test.

Comment 2 Ademar Reis 2014-02-13 18:11:45 UTC

(In reply to Amador Pahim from comment #1)
> Seem like upstream kernel commit a05123bdd1b9ba961ed262864924a5b3ee81afe8
> fixes the issue. I'm running a patched downstream build to test.

Amador: please let us know the result (setting NEEDINFO)

Comment 3 Amador Pahim 2014-02-13 19:14:25 UTC

Yes, it works. No messages if running RHEL guests with patched kernel.
Please backport a05123bdd1b9ba961ed262864924a5b3ee81afe8.

Comment 4 Amador Pahim 2014-02-13 19:35:24 UTC

As per commit description, it's also avoiding a kernel hang:

[tip:perf/urgent] perf/x86: Disable uncore on virtualized CPUs
Commit-ID:  a05123bdd1b9ba961ed262864924a5b3ee81afe8

perf/x86: Disable uncore on virtualized CPUs

Initializing uncore PMU on virtualized CPU may hang the kernel.
This is because kvm does not emulate the entire hardware. Thers
are lots of uncore related MSRs, making kvm enumerate them all
is a non-trival task. So just disable uncore on virtualized CPU.

Comment 5 Qunfang Zhang 2014-02-14 08:59:58 UTC

Reproduced on kernel-2.6.32-440.el6.x86_64 & qemu-kvm-rhev-0.12.1.2-2.420.el6.x86_64. Running a rhel guest on host and check the host dmesg. 

kvm: 960: cpu1 unhandled wrmsr: 0x391 data 2000000f
kvm: 960: cpu0 unhandled wrmsr: 0x391 data 2000000f

Comment 9 Radim Krčmář 2014-04-17 22:01:35 UTC

http://www.mail-archive.com/kvm@vger.kernel.org/msg77524.html
reports that the guest dies from #GP when uncore is probed, but it is not yet clear why RHEL6 doesn't.

Have your guests always booted fine?

Comment 10 Qunfang Zhang 2014-04-18 02:32:34 UTC

(In reply to Radim Krčmář from comment #9)
> http://www.mail-archive.com/kvm@vger.kernel.org/msg77524.html
> reports that the guest dies from #GP when uncore is probed, but it is not
> yet clear why RHEL6 doesn't.
> 
> Have your guests always booted fine?

Yes, in my experiments, the guests could boot up successfully even the "kvm: 960: cpu1 unhandled wrmsr: 0x391 data 2000000f" log printed in the host dmesg. Sorry what does the "when uncore is probed" mean?

Comment 11 Radim Krčmář 2014-04-18 10:01:18 UTC

(In reply to Qunfang Zhang from comment #10)
> (In reply to Radim Krčmář from comment #9)
> > http://www.mail-archive.com/kvm@vger.kernel.org/msg77524.html
> > reports that the guest dies from #GP when uncore is probed, but it is not
> > yet clear why RHEL6 doesn't.
> > 
> > Have your guests always booted fine?
> 
> Yes, in my experiments, the guests could boot up successfully even the "kvm:
> 960: cpu1 unhandled wrmsr: 0x391 data 2000000f" log printed in the host
> dmesg. Sorry what does the "when uncore is probed" mean?

Thanks,
I meant that the upstream bug happens when we initialize uncore without really knowing it is present. (The debug message appears because of this.)

I also looked why we don't #GP: paravirtualization always converts msr operations into their respective safe variants, so the exception is handled.
(Not the cleanest design decision, but it works :)

Comment 12 RHEL Program Management 2014-04-22 13:21:16 UTC

This request was evaluated by Red Hat Product Management for
inclusion in a Red Hat Enterprise Linux release.  Product
Management has requested further review of this request by
Red Hat Engineering, for potential inclusion in a Red Hat
Enterprise Linux release for currently deployed products.
This request is not yet committed for inclusion in a release.

Comment 13 Rafael Aquini 2014-04-24 00:23:55 UTC

Patch(es) available on kernel-2.6.32-460.el6

Comment 16 Qunfang Zhang 2014-07-15 10:08:29 UTC

Reproduced on kernel-2.6.32-431.22.1.el6.x86_64(host). 

After boot up a rhel guest, host dmesg shows:

[root@amd-1216-8-3 ~]# dmesg 
tap0: no IPv6 routers present
kvm: 32142: cpu0 unhandled rdmsr: 0xc0010112
kvm: 32142: cpu0 unhandled rdmsr: 0xc0010001

CLI:

# /usr/libexec/qemu-kvm -cpu Opteron_G1  -M pc -enable-kvm -m 2048 -smp 4,sockets=2,cores=2,threads=1 -name rhel6.4-64 -uuid 9a0e67ec-f286-d8e7-0548-0c1c9ec93009 -nodefconfig -nodefaults -monitor stdio -rtc base=utc,clock=host,driftfix=slew -no-kvm-pit-reinjection -no-shutdown -drive file=/root/RHEL-Server-6.5-64-virtio.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev tap,id=hostnet0,vhost=on -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:d5:51:8a,bus=pci.0,addr=0x3  -vnc :10 -vga std  -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 -qmp tcp:0:5555,server,nowait -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0 -usb -device usb-tablet -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x7 -chardev socket,id=channel2,path=/tmp/helloworld2,server,nowait -device virtserialport,chardev=channel2,name=port2,bus=virtio-serial0.0,id=port2 

Re-test with latest kernel-2.6.32-491.el6.x86_64 (both host and guest), same result. 

Host dmesg:

[root@amd-1216-8-3 ~]# dmesg 
device tap0 entered promiscuous mode
switch: port 2(tap0) entering forwarding state
tap0: no IPv6 routers present
kvm: 6848: cpu0 unhandled rdmsr: 0xc0010112
kvm: 6848: cpu0 unhandled rdmsr: 0xc0010001

Host info:

processor	: 1
vendor_id	: AuthenticAMD
cpu family	: 15
model		: 67
model name	: Dual-Core AMD Opteron(tm) Processor 1216
stepping	: 3
cpu MHz		: 1000.000
cache size	: 1024 KB
physical id	: 0
siblings	: 2
core id		: 1
cpu cores	: 2
apicid		: 1
initial apicid	: 1
fpu		: yes
fpu_exception	: yes
cpuid level	: 1
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm 3dnowext 3dnow rep_good extd_apicid pni cx16 lahf_lm cmp_legacy svm extapic cr8_legacy
bogomips	: 2009.23
TLB size	: 1024 4K pages
clflush size	: 64
cache_alignment	: 64
address sizes	: 40 bits physical, 48 bits virtual
power management: ts fid vid ttp tm stc


Hi, Radim

Any idea about it? 

Thanks!
Qunfang

Comment 17 Radim Krčmář 2014-07-15 12:28:45 UTC

Those are AMD specific MSRs, to probe what we don't emulate

 0xc0010112 = MSR_K8_TSEG_ADDR -- reserved memory for SMM
 0xc0010001 = MSR_K7_EVNTSEL1  -- performance event selector (Bug 928284)

Unlike Intel Uncore, we might implement those features in the future, our customers likely don't ask about them much, and patches to disable it are not upstream, so it is not worth to hide them right now ...

Comment 18 Qunfang Zhang 2014-07-18 08:42:29 UTC

(In reply to Radim Krčmář from comment #17)
> Those are AMD specific MSRs, to probe what we don't emulate
> 
>  0xc0010112 = MSR_K8_TSEG_ADDR -- reserved memory for SMM
>  0xc0010001 = MSR_K7_EVNTSEL1  -- performance event selector (Bug 928284)
> 
> Unlike Intel Uncore, we might implement those features in the future, our
> customers likely don't ask about them much, and patches to disable it are
> not upstream, so it is not worth to hide them right now ...

So for this bug we will only need to focus on the 0x391 register, right? However, I could not reproduce it on both old host kernel-2.6.32-437.el6.x86_64 and newer version kernel-2.6.32-489.el6.x86_64.

I just boot up a rhel6.6 guest on an Intel host:

CLI:
 /usr/libexec/qemu-kvm -cpu SandyBridge -M rhel6.5.0 -enable-kvm -m 2048 -smp 4,sockets=2,cores=2,threads=1 -name rhel6.4-64 -uuid 9a0e67ec-f286-d8e7-0548-0c1c9ec93009 -nodefconfig -nodefaults -monitor stdio -rtc base=utc,clock=host,driftfix=slew -no-kvm-pit-reinjection -no-shutdown -drive file=/root/RHEL-Server-6.6-64-virtio.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev tap,id=hostnet0,vhost=on -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:d5:51:8a,bus=pci.0,addr=0x3  -vnc :10 -vga std  -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 -qmp tcp:0:5555,server,nowait -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0 -usb -device usb-tablet -device virtio-serial-pci,bus=pci.0,addr=0x9,id=virtio-serial0 -chardev socket,path=/tmp/qga.sock,server,nowait,id=qga0 -device virtserialport,bus=virtio-serial0.0,chardev=qga0,name=org.qemu.guest_agent.0

Host info:

processor	: 3
vendor_id	: GenuineIntel
cpu family	: 6
model		: 42
model name	: Intel(R) Core(TM) i5-2400 CPU @ 3.10GHz
stepping	: 7
microcode	: 41
cpu MHz		: 1600.000
cache size	: 6144 KB
physical id	: 0
siblings	: 4
core id		: 3
cpu cores	: 4
apicid		: 6
initial apicid	: 6
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer xsave avx lahf_lm ida arat epb xsaveopt pln pts dts tpr_shadow vnmi flexpriority ept vpid
bogomips	: 6184.30
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 48 bits virtual
power management:


Hi, Radim

Actually I always hit these message in the past but I could not reproduce it now. Do we need some special host? Eg, some cpu flag is needed? 

Thanks,
Qunfang

Comment 19 Qunfang Zhang 2014-08-18 07:40:30 UTC

Now I could reproduce this bug on one of my Intel host on kernel-2.6.32-431.el6.x86_64, and verified pass on kernel-2.6.32-496.el6.x86_64.

CLI:

# /usr/libexec/qemu-kvm -M rhel6.5.0 -cpu SandyBridge -m 2G -smp 2,sockets=1,cores=2,threads=1,maxcpus=16 -enable-kvm -name rhel6.6 -uuid 990ea161-6b67-47b2-b803-19fb01d30d12 -smbios type=1,manufacturer='Red Hat',product='RHEV Hypervisor',version=el6,serial=koTUXQrb,uuid=feebc8fd-f8b0-4e75-abc3-e63fcdb67170 -k en-us -rtc base=localtime,clock=host,driftfix=slew -nodefaults -monitor stdio -qmp tcp:0:6666,server,nowait -boot menu=on  -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0 -monitor unix:/tmp/monitor-unix,nowait,server -drive file=/root/RHEL-Server-6.6-64-virtio.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none,werror=stop,rerror=stop,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,drive=drive-virtio-disk0,id=virtio-disk0 -netdev tap,id=hostnet0,vhost=on -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:1a:4a:2e:28:1c,bus=pci.0,addr=0x4 -vga std -vnc :10 -usb -device usb-tablet

Reproduced on the following version:

Host:
kernel2.6.32-431.el6.x86_64
qemu-kvm-0.12.1.2-2.415.el6.x86_64
Guest:
kernel2.6.32-431.el6.x86_64

After boot up the guest, the host dmesg displays:

kvm: 2849: cpu0 unhandled wrmsr: 0x391 data 2000000f
kvm: 2849: cpu0 unhandled wrmsr: 0x391 data 2000000f


Verified pass on the following version:

Host:
kernel-2.6.32-496.el6.x86_64
qemu-kvm-0.12.1.2-2.437.el6.x86_64

Guest:
kernel-2.6.32-483.el6.x86_64

After boot up the guest (and even reboot it for some times), there's no the "kvm: 2849: cpu0 unhandled wrmsr: 0x391 data 2000000f" displayed in the host dmesg. 

==================

Host cpuinfo:

processor	: 3
vendor_id	: GenuineIntel
cpu family	: 6
model		: 42
model name	: Intel(R) Core(TM) i5-2400 CPU @ 3.10GHz
stepping	: 7
microcode	: 41
cpu MHz		: 1600.000
cache size	: 6144 KB
physical id	: 0
siblings	: 4
core id		: 3
cpu cores	: 4
apicid		: 6
initial apicid	: 6
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer xsave avx lahf_lm ida arat epb xsaveopt pln pts dts tpr_shadow vnmi flexpriority ept vpid
bogomips	: 6185.56
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 48 bits virtual
power management:


Based on above, the bug is fixed.

Comment 20 errata-xmlrpc 2014-10-14 05:56:08 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2014-1392.html