Bug 1063836
Summary: | kvm: 23090: cpu0 unhandled wrmsr 0x391 data 2000000f | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Amador Pahim <asegundo> |
Component: | kernel | Assignee: | Radim Krčmář <rkrcmar> |
Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> |
Severity: | medium | Docs Contact: | |
Priority: | unspecified | ||
Version: | 6.5 | CC: | areis, asegundo, bsarathy, chayang, drjones, hhuang, juzhang, michen, mkenneth, qzhang, rbalakri, rkrcmar, virt-maint |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | kernel-2.6.32-460.el6 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2014-10-14 05:56:08 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Amador Pahim
2014-02-11 13:50:31 UTC
Seem like upstream kernel commit a05123bdd1b9ba961ed262864924a5b3ee81afe8 fixes the issue. I'm running a patched downstream build to test. (In reply to Amador Pahim from comment #1) > Seem like upstream kernel commit a05123bdd1b9ba961ed262864924a5b3ee81afe8 > fixes the issue. I'm running a patched downstream build to test. Amador: please let us know the result (setting NEEDINFO) Yes, it works. No messages if running RHEL guests with patched kernel. Please backport a05123bdd1b9ba961ed262864924a5b3ee81afe8. As per commit description, it's also avoiding a kernel hang: [tip:perf/urgent] perf/x86: Disable uncore on virtualized CPUs Commit-ID: a05123bdd1b9ba961ed262864924a5b3ee81afe8 perf/x86: Disable uncore on virtualized CPUs Initializing uncore PMU on virtualized CPU may hang the kernel. This is because kvm does not emulate the entire hardware. Thers are lots of uncore related MSRs, making kvm enumerate them all is a non-trival task. So just disable uncore on virtualized CPU. Reproduced on kernel-2.6.32-440.el6.x86_64 & qemu-kvm-rhev-0.12.1.2-2.420.el6.x86_64. Running a rhel guest on host and check the host dmesg. kvm: 960: cpu1 unhandled wrmsr: 0x391 data 2000000f kvm: 960: cpu0 unhandled wrmsr: 0x391 data 2000000f http://www.mail-archive.com/kvm@vger.kernel.org/msg77524.html reports that the guest dies from #GP when uncore is probed, but it is not yet clear why RHEL6 doesn't. Have your guests always booted fine? (In reply to Radim Krčmář from comment #9) > http://www.mail-archive.com/kvm@vger.kernel.org/msg77524.html > reports that the guest dies from #GP when uncore is probed, but it is not > yet clear why RHEL6 doesn't. > > Have your guests always booted fine? Yes, in my experiments, the guests could boot up successfully even the "kvm: 960: cpu1 unhandled wrmsr: 0x391 data 2000000f" log printed in the host dmesg. Sorry what does the "when uncore is probed" mean? (In reply to Qunfang Zhang from comment #10) > (In reply to Radim Krčmář from comment #9) > > http://www.mail-archive.com/kvm@vger.kernel.org/msg77524.html > > reports that the guest dies from #GP when uncore is probed, but it is not > > yet clear why RHEL6 doesn't. > > > > Have your guests always booted fine? > > Yes, in my experiments, the guests could boot up successfully even the "kvm: > 960: cpu1 unhandled wrmsr: 0x391 data 2000000f" log printed in the host > dmesg. Sorry what does the "when uncore is probed" mean? Thanks, I meant that the upstream bug happens when we initialize uncore without really knowing it is present. (The debug message appears because of this.) I also looked why we don't #GP: paravirtualization always converts msr operations into their respective safe variants, so the exception is handled. (Not the cleanest design decision, but it works :) This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux release for currently deployed products. This request is not yet committed for inclusion in a release. Patch(es) available on kernel-2.6.32-460.el6 Reproduced on kernel-2.6.32-431.22.1.el6.x86_64(host). After boot up a rhel guest, host dmesg shows: [root@amd-1216-8-3 ~]# dmesg tap0: no IPv6 routers present kvm: 32142: cpu0 unhandled rdmsr: 0xc0010112 kvm: 32142: cpu0 unhandled rdmsr: 0xc0010001 CLI: # /usr/libexec/qemu-kvm -cpu Opteron_G1 -M pc -enable-kvm -m 2048 -smp 4,sockets=2,cores=2,threads=1 -name rhel6.4-64 -uuid 9a0e67ec-f286-d8e7-0548-0c1c9ec93009 -nodefconfig -nodefaults -monitor stdio -rtc base=utc,clock=host,driftfix=slew -no-kvm-pit-reinjection -no-shutdown -drive file=/root/RHEL-Server-6.5-64-virtio.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev tap,id=hostnet0,vhost=on -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:d5:51:8a,bus=pci.0,addr=0x3 -vnc :10 -vga std -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 -qmp tcp:0:5555,server,nowait -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0 -usb -device usb-tablet -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x7 -chardev socket,id=channel2,path=/tmp/helloworld2,server,nowait -device virtserialport,chardev=channel2,name=port2,bus=virtio-serial0.0,id=port2 Re-test with latest kernel-2.6.32-491.el6.x86_64 (both host and guest), same result. Host dmesg: [root@amd-1216-8-3 ~]# dmesg device tap0 entered promiscuous mode switch: port 2(tap0) entering forwarding state tap0: no IPv6 routers present kvm: 6848: cpu0 unhandled rdmsr: 0xc0010112 kvm: 6848: cpu0 unhandled rdmsr: 0xc0010001 Host info: processor : 1 vendor_id : AuthenticAMD cpu family : 15 model : 67 model name : Dual-Core AMD Opteron(tm) Processor 1216 stepping : 3 cpu MHz : 1000.000 cache size : 1024 KB physical id : 0 siblings : 2 core id : 1 cpu cores : 2 apicid : 1 initial apicid : 1 fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm 3dnowext 3dnow rep_good extd_apicid pni cx16 lahf_lm cmp_legacy svm extapic cr8_legacy bogomips : 2009.23 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: ts fid vid ttp tm stc Hi, Radim Any idea about it? Thanks! Qunfang Those are AMD specific MSRs, to probe what we don't emulate 0xc0010112 = MSR_K8_TSEG_ADDR -- reserved memory for SMM 0xc0010001 = MSR_K7_EVNTSEL1 -- performance event selector (Bug 928284) Unlike Intel Uncore, we might implement those features in the future, our customers likely don't ask about them much, and patches to disable it are not upstream, so it is not worth to hide them right now ... (In reply to Radim Krčmář from comment #17) > Those are AMD specific MSRs, to probe what we don't emulate > > 0xc0010112 = MSR_K8_TSEG_ADDR -- reserved memory for SMM > 0xc0010001 = MSR_K7_EVNTSEL1 -- performance event selector (Bug 928284) > > Unlike Intel Uncore, we might implement those features in the future, our > customers likely don't ask about them much, and patches to disable it are > not upstream, so it is not worth to hide them right now ... So for this bug we will only need to focus on the 0x391 register, right? However, I could not reproduce it on both old host kernel-2.6.32-437.el6.x86_64 and newer version kernel-2.6.32-489.el6.x86_64. I just boot up a rhel6.6 guest on an Intel host: CLI: /usr/libexec/qemu-kvm -cpu SandyBridge -M rhel6.5.0 -enable-kvm -m 2048 -smp 4,sockets=2,cores=2,threads=1 -name rhel6.4-64 -uuid 9a0e67ec-f286-d8e7-0548-0c1c9ec93009 -nodefconfig -nodefaults -monitor stdio -rtc base=utc,clock=host,driftfix=slew -no-kvm-pit-reinjection -no-shutdown -drive file=/root/RHEL-Server-6.6-64-virtio.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev tap,id=hostnet0,vhost=on -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:d5:51:8a,bus=pci.0,addr=0x3 -vnc :10 -vga std -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 -qmp tcp:0:5555,server,nowait -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0 -usb -device usb-tablet -device virtio-serial-pci,bus=pci.0,addr=0x9,id=virtio-serial0 -chardev socket,path=/tmp/qga.sock,server,nowait,id=qga0 -device virtserialport,bus=virtio-serial0.0,chardev=qga0,name=org.qemu.guest_agent.0 Host info: processor : 3 vendor_id : GenuineIntel cpu family : 6 model : 42 model name : Intel(R) Core(TM) i5-2400 CPU @ 3.10GHz stepping : 7 microcode : 41 cpu MHz : 1600.000 cache size : 6144 KB physical id : 0 siblings : 4 core id : 3 cpu cores : 4 apicid : 6 initial apicid : 6 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer xsave avx lahf_lm ida arat epb xsaveopt pln pts dts tpr_shadow vnmi flexpriority ept vpid bogomips : 6184.30 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: Hi, Radim Actually I always hit these message in the past but I could not reproduce it now. Do we need some special host? Eg, some cpu flag is needed? Thanks, Qunfang Now I could reproduce this bug on one of my Intel host on kernel-2.6.32-431.el6.x86_64, and verified pass on kernel-2.6.32-496.el6.x86_64. CLI: # /usr/libexec/qemu-kvm -M rhel6.5.0 -cpu SandyBridge -m 2G -smp 2,sockets=1,cores=2,threads=1,maxcpus=16 -enable-kvm -name rhel6.6 -uuid 990ea161-6b67-47b2-b803-19fb01d30d12 -smbios type=1,manufacturer='Red Hat',product='RHEV Hypervisor',version=el6,serial=koTUXQrb,uuid=feebc8fd-f8b0-4e75-abc3-e63fcdb67170 -k en-us -rtc base=localtime,clock=host,driftfix=slew -nodefaults -monitor stdio -qmp tcp:0:6666,server,nowait -boot menu=on -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0 -monitor unix:/tmp/monitor-unix,nowait,server -drive file=/root/RHEL-Server-6.6-64-virtio.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none,werror=stop,rerror=stop,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,drive=drive-virtio-disk0,id=virtio-disk0 -netdev tap,id=hostnet0,vhost=on -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:1a:4a:2e:28:1c,bus=pci.0,addr=0x4 -vga std -vnc :10 -usb -device usb-tablet Reproduced on the following version: Host: kernel2.6.32-431.el6.x86_64 qemu-kvm-0.12.1.2-2.415.el6.x86_64 Guest: kernel2.6.32-431.el6.x86_64 After boot up the guest, the host dmesg displays: kvm: 2849: cpu0 unhandled wrmsr: 0x391 data 2000000f kvm: 2849: cpu0 unhandled wrmsr: 0x391 data 2000000f Verified pass on the following version: Host: kernel-2.6.32-496.el6.x86_64 qemu-kvm-0.12.1.2-2.437.el6.x86_64 Guest: kernel-2.6.32-483.el6.x86_64 After boot up the guest (and even reboot it for some times), there's no the "kvm: 2849: cpu0 unhandled wrmsr: 0x391 data 2000000f" displayed in the host dmesg. ================== Host cpuinfo: processor : 3 vendor_id : GenuineIntel cpu family : 6 model : 42 model name : Intel(R) Core(TM) i5-2400 CPU @ 3.10GHz stepping : 7 microcode : 41 cpu MHz : 1600.000 cache size : 6144 KB physical id : 0 siblings : 4 core id : 3 cpu cores : 4 apicid : 6 initial apicid : 6 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer xsave avx lahf_lm ida arat epb xsaveopt pln pts dts tpr_shadow vnmi flexpriority ept vpid bogomips : 6185.56 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: Based on above, the bug is fixed. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2014-1392.html |