Bug 905094
Summary: | kvm emulate_invalid_guest_state broken (fixed in 3.9) | ||||||
---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Stefan Jensen <sjensen> | ||||
Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> | ||||
Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 17 | CC: | berrange, clalancette, crobinso, gansalmon, gleb, itamar, jforbes, jonathan, jyang, kernel-maint, laine, libvirt-maint, madhu.chinakonda, mtosatti, sjensen, veillard | ||||
Target Milestone: | --- | Keywords: | Reopened | ||||
Target Release: | --- | ||||||
Hardware: | i686 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | kernel-3.9.8-100.fc17 | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2013-07-01 01:31:47 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Stefan Jensen
2013-01-28 15:02:13 UTC
What kernel version? Does booting from an older kernel make any difference? Current kernel is 3.7.4-104.fc17.i686.PAE Unfortunately I have uninstalled the older kernel already. If really needed, I'll reinstall an older one. But looking through yum.log, there seems no package update to libvirt or related packages between the last kernel update and the current one. So I strongly assume, the older kernel had worked. Stefan, finding out the exact kernel version that breaks things will help determine the bit that changed, and whether this is accidental or intentional, etc. Do you know at least what the last kernel version was? > VM can not be started, enters "pause" state. I don';t know of anything related to cgroups that can cause the VM to silently transition to the pause state. What would cause that is if QEMU saw an I/O error on one of the disks, or if there was a KVM exception. Please provide the /var/log/libvirt/qemu/$VMNAME.log file for the VM showing this behaviour > kernel: [ 124.456682] cgroup: libvirtd (1568) created nested cgroup for > controller "memory" which has incomplete hierarchy support. Nested groups > may change behavior in the future. > kernel: [ 124.456690] cgroup: "memory" requires setting use_hierarchy to 1 > on the root. > kernel: [ 124.456845] cgroup: libvirtd (1568) created nested cgroup for > controller "devices" which has incomplete hierarchy support. Nested cgroups > may change behavior in the future. > kernel: [ 124.456937] cgroup: libvirtd (1568) created nested cgroup for > controller "freezer" which has incomplete hierarchy support. Nested cgroups > may change behavior in the future. > kernel: [ 124.457064] cgroup: libvirtd (1568) created nested cgroup for > controller "blkio" which has incomplete hierarchy support. Nested cgroups > may change behavior in the future. These messages from the kernel are all merely warnings. I forgot to mention, that i use faily early VT Hardware (Intel L2400 @1.66GHz) and had to enable a kvm_intel modul option, to get it run on that cpu and to avoid the famous "KVM: entry failed, hardware error 0x80000021" error. This used to work: # cat /etc/modprobe.d/thinkpad-kvm-intel.conf options kvm_intel emulate_invalid_guest_state=0 There are no kernels in the repo older than 3.7.3-101, so I have now tested only 2 Kernels. kernel-PAE-3.7.3-101.fc17.i686 = doesn't work (even with "emulate_invalid_guest_state=0") kernel-PAE-3.7.4-104.fc17.i686 = doesn't work (even with "emulate_invalid_guest_state=0") Both of them failed with "KVM: entry failed, hardware error 0x80000021", even with "emulate_invalid_guest_state=0", which has worked prior to 3.7.3-101.fc17.i686. from $VMNAME.log: KVM: entry failed, hardware error 0x80000021 If you're running a guest on an Intel machine without unrestricted mode support, the failure can be most likely due to the guest entering an invalid state for Intel VT. For example, the guest maybe running in big real mode which is not supported on less recent Intel processors. EAX=00000011 EBX=18ae1000 ECX=00006a32 EDX=000fffa9 ESI=0ffeb084 EDI=00000000 EBP=000069f2 ESP=000069f2 EIP=0000c4cd EFL=00010006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =fd39 000fd390 ffffffff 00809300 DPL=0 DS16 [-WA] CS =f000 000f0000 0000ffff 00009b00 DPL=0 CS16 [-RA] SS =0000 00000000 0000ffff 00009300 DPL=0 DS16 [-WA] DS =0030 00000000 ffffffff 00809300 DPL=0 DS16 [-WA] FS =0030 00000000 ffffffff 00809300 DPL=0 DS16 [-WA] GS =ca00 000ca000 ffffffff 00809300 DPL=0 DS16 [-WA] LDT=0000 00000000 0000ffff 00008200 DPL=0 LDT TR =0000 00000000 0000ffff 00008b00 DPL=0 TSS32-busy GDT= 000fd3a8 00000037 IDT= 000fd3e6 00000000 CR0=00000011 CR2=00000000 CR3=00000000 CR4=00000000 DR0=0000000000000000 DR1=0000000000000000 DR2=0000000700000000 DR3=0000000000000000 DR6=00000000ffff0ff0 DR7=0000000000000400 EFER=0000000000000000 Code=01 1e e0 d3 2e 0f 01 16 a0 d3 0f 20 c0 66 83 c8 01 0f 22 c0 <66> ea d5 c4 0f 00 08 00 b8 10 00 00 00 8e d8 8e c0 8e d0 8e e0 8e e8 89 c8 ff e2 89 c1 b8 I think this is caused by emulate_invalid_guest_state being broken in kernel 3.7 and 3.8. Fixes are queued for 3.9. *** Bug 873874 has been marked as a duplicate of this bug. *** Err, didn't mean to close it. kernel-3.9.4-100.fc17 has been submitted as an update for Fedora 17. https://admin.fedoraproject.org/updates/FEDORA-2013-9123/kernel-3.9.4-100.fc17 Package kernel-3.9.4-101.fc17: * should fix your issue, * was pushed to the Fedora 17 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing kernel-3.9.4-101.fc17' as soon as you are able to, then reboot. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2013-9123/kernel-3.9.4-101.fc17 then log in and leave karma (feedback). Still an issue for me. Even with kernel-3.9.4-101. Same behavior as in my comment no #5. (In reply to Stefan Jensen from comment #11) > Still an issue for me. Even with kernel-3.9.4-101. Same behavior as in my > comment no #5. What is the exact register dump now. Also try whit emulate_invalid_guest_state=1. Package kernel-3.9.5-100.fc17: * should fix your issue, * was pushed to the Fedora 17 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing kernel-3.9.5-100.fc17' as soon as you are able to, then reboot. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2013-9123/kernel-3.9.5-100.fc17 then log in and leave karma (feedback). Package kernel-3.9.5-101.fc17: * should fix your issue, * was pushed to the Fedora 17 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing kernel-3.9.5-101.fc17' as soon as you are able to, then reboot. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2013-9123/kernel-3.9.5-101.fc17 then log in and leave karma (feedback). Sorry, still no luck. Tested with "kernel-3.9.5-100.fc17" and "kernel-3.9.5-101.fc17", also both with "emulate_invalid_guest_state=1|0". The VM now does not go to "pause", but instead eats 100% CPU forever. No errors in log. Eats 100% CPU with both emulate_invalid_guest_state=1 and 0? What is the QEMU command line? What is the output of "info registers" when VM hangs? Package kernel-3.9.7-100.fc17: * should fix your issue, * was pushed to the Fedora 17 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing kernel-3.9.7-100.fc17' as soon as you are able to, then reboot. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2013-9123/kernel-3.9.7-100.fc17 then log in and leave karma (feedback). kernel-3.9.8-100.fc17 has been pushed to the Fedora 17 stable repository. If problems still persist, please make note of it in this bug report. Created attachment 767324 [details]
Logentries from a failed kvm session
Sorry but still no luck. Please see attached logfile(s)
I believe this is the same as http://www.spinics.net/lists/stable/msg12953.html. I am curious what is your host cpu. Paste the output of "cat /proc/cpuinfo" here. As stated in my comment #5, it is a Intel L2400 @1.66GHz CPU. Full output of cpuinfo comes here: # cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 14 model name : Genuine Intel(R) CPU L2400 @ 1.66GHz stepping : 8 microcode : 0x39 cpu MHz : 1667.000 cache size : 2048 KB physical id : 0 siblings : 2 core id : 0 cpu cores : 2 apicid : 0 initial apicid : 0 fdiv_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx constant_tsc arch_perfmon bts aperfmperf pni monitor vmx est tm2 xtpr pdcm dtherm bogomips : 3325.05 clflush size : 64 cache_alignment : 64 address sizes : 32 bits physical, 32 bits virtual power management: processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 14 model name : Genuine Intel(R) CPU L2400 @ 1.66GHz stepping : 8 microcode : 0x39 cpu MHz : 1000.000 cache size : 2048 KB physical id : 0 siblings : 2 core id : 1 cpu cores : 2 apicid : 1 initial apicid : 1 fdiv_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx constant_tsc arch_perfmon bts aperfmperf pni monitor vmx est tm2 xtpr pdcm dtherm bogomips : 3325.05 clflush size : 64 cache_alignment : 64 address sizes : 32 bits physical, 32 bits virtual power management: (In reply to Stefan Jensen from comment #21) > As stated in my comment #5, it is a Intel L2400 @1.66GHz CPU. Full output of > cpuinfo comes here: > > > # cat /proc/cpuinfo > > processor : 0 > vendor_id : GenuineIntel > cpu family : 6 > model : 14 > model name : Genuine Intel(R) CPU L2400 @ 1.66GHz > stepping : 8 > microcode : 0x39 > cpu MHz : 1667.000 > cache size : 2048 KB > physical id : 0 > siblings : 2 > core id : 0 > cpu cores : 2 > apicid : 0 > initial apicid : 0 > fdiv_bug : no > f00f_bug : no > coma_bug : no > fpu : yes > fpu_exception : yes > cpuid level : 10 > wp : yes > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov > clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx constant_tsc arch_perfmon > bts aperfmperf pni monitor vmx est tm2 xtpr pdcm dtherm > bogomips : 3325.05 > clflush size : 64 > cache_alignment : 64 > address sizes : 32 bits physical, 32 bits virtual > power management: > Same one (or close enough) as in http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=707257. |