Description of problem: kernel panics and slowdowns Regression of https://bugzilla.redhat.com/show_bug.cgi?id=1566258 Version-Release number of selected component (if applicable): 4.18.9-200.fc28.x86_64 How reproducible: always Steps to Reproduce: 1. Start a virtual machine under virt-manager (windows 7 or XP) 2. See it running slowly 3. Check dmesg and see zillions of warnings Actual results: vmwrite error: reg 401e value 0 (err 12) CPU: 0 PID: 2994 Comm: CPU 0/KVM Not tainted 4.18.9-200.fc28.x86_64 #1 Call Trace: dump_stack+0x5c/0x80 vmx_set_virtual_apic_mode+0x1c5/0x270 [kvm_intel] ? kvm_make_all_cpus_request+0x4d/0x70 [kvm] kvm_lapic_set_base+0x7b/0x190 [kvm] kvm_set_apic_base+0xac/0xd0 [kvm] kvm_set_msr_common+0x829/0xc30 [kvm] ? vmx_vcpu_run+0x404/0xb80 [kvm_intel] ? vmx_vcpu_run+0x410/0xb80 [kvm_intel] ? vmx_vcpu_run+0x404/0xb80 [kvm_intel] ? vmx_vcpu_run+0x410/0xb80 [kvm_intel] ? vmx_vcpu_run+0x404/0xb80 [kvm_intel] handle_wrmsr+0x55/0x140 [kvm_intel] kvm_arch_vcpu_ioctl_run+0x56e/0x1a10 [kvm] ? pollwake+0x74/0x90 ? wake_up_q+0x70/0x70 ? __wake_up_common+0x7a/0x190 kvm_vcpu_ioctl+0x2b8/0x5e0 [kvm] ? wake_up_q+0x70/0x70 do_vfs_ioctl+0xa4/0x620 ksys_ioctl+0x60/0x90 ? ksys_write+0x9c/0xb0 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x5b/0x160 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f0a0be6ec57 Expected results: Runs fast with no warnings Additional info: Old processor, same one hit previously by kernel regression vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Core(TM)2 Quad CPU @ 2.66GHz
Let me add some more information here. First of all the regression I see started in kernel-4.18.8-200.fc28.x86_64, the last working kernel is kernel-4.18.7-200.fc28.x86_64 that I am running now. Secondarily the behavior I see is different. My windows 7 VM goes in pause state and trying to unpause it gives me the error: libvirtError: internal error: unable to execute QEMU command 'cont': Resetting the Virtual Machine is required
The issue Gerhard is seeing is a new bug introduced by commit 8d860bbeedef ("kvm: vmx: Basic APIC virtualization controls have three settings"). Resolving that issue is straightforward, KVM just needs to check whether or not secondary execution controls (VMCS field 401e) is supported before writing said field. Paolo's pause issue is likely something different. The aforementioned commit exists in all 4.18 releases (https://lkml.org/lkml/2018/6/12/439), and while it could spam the kernel log if the guest VM constantly writes the APIC_BASE MSR, I don't see any way for it to result in a KVM internal error.
Paolo, your bug is most likely the L1TF GFN MMIO issue: https://www.spinics.net/lists/kvm/msg175634.html
Same problem with kernel 4.18.10-200.fc28.x86 and CentOS 7 VM. VM doesn't run and same errors in dmesg: [ 95.142808] vmwrite error: reg 401e value 0 (err 12) [ 95.143018] CPU: 0 PID: 1945 Comm: CPU 0/KVM Not tainted 4.18.10-200.fc28.x86_64 #1 [ 95.143286] Hardware name: Dell Inc. PowerEdge 2900/0TM757, BIOS 2.7.0 10/30/2010 [ 95.143535] Call Trace: [ 95.143654] dump_stack+0x5c/0x80 [ 95.143787] vmx_set_virtual_apic_mode+0x1c5/0x270 [kvm_intel] [ 95.144076] ? kvm_make_all_cpus_request+0x4d/0x70 [kvm] [ 95.144312] kvm_lapic_set_base+0x7b/0x190 [kvm] [ 95.144524] kvm_set_apic_base+0xac/0xd0 [kvm] [ 95.144717] kvm_set_msr_common+0x829/0xc30 [kvm] [ 95.144892] ? vmx_vcpu_run+0x404/0xb80 [kvm_intel] [ 95.145078] ? vmx_vcpu_run+0x410/0xb80 [kvm_intel] [ 95.145256] ? vmx_vcpu_run+0x404/0xb80 [kvm_intel] [ 95.145437] ? vmx_vcpu_run+0x410/0xb80 [kvm_intel] [ 95.145626] ? vmx_vcpu_run+0x404/0xb80 [kvm_intel] [ 95.145811] handle_wrmsr+0x55/0x140 [kvm_intel] [ 95.146021] kvm_arch_vcpu_ioctl_run+0x56e/0x1a10 [kvm] [ 95.146243] kvm_vcpu_ioctl+0x2b8/0x5e0 [kvm] [ 95.146409] ? __switch_to_asm+0x34/0x70 [ 95.146555] ? __switch_to_asm+0x40/0x70 [ 95.146704] ? __switch_to_asm+0x34/0x70 [ 95.146856] ? __switch_to_asm+0x40/0x70 [ 95.147006] ? __switch_to_asm+0x34/0x70 [ 95.147151] ? __switch_to_asm+0x40/0x70 [ 95.147301] ? __switch_to_asm+0x34/0x70 [ 95.147452] ? __switch_to_asm+0x40/0x70 [ 95.147605] ? __switch_to_asm+0x34/0x70 [ 95.147765] ? apic_timer_interrupt+0xa/0x20 [ 95.147928] do_vfs_ioctl+0xa4/0x620 [ 95.148074] ksys_ioctl+0x60/0x90 [ 95.148191] __x64_sys_ioctl+0x16/0x20 [ 95.148332] do_syscall_64+0x5b/0x160 [ 95.148472] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 95.148652] RIP: 0033:0x7fcf4a10bc57 [ 95.148789] Code: 00 00 90 48 8b 05 49 82 2c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 19 82 2c 00 f7 d8 64 89 01 48 [ 95.149496] RSP: 002b:00007fcf3ad32648 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 95.149757] RAX: ffffffffffffffda RBX: 00007fcf53dc7002 RCX: 00007fcf4a10bc57 [ 95.149995] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000017 [ 95.150235] RBP: 0000000000000001 R08: 00005637b9610790 R09: 000000000000ffff [ 95.150477] R10: 00005637b95fa4f0 R11: 0000000000000246 R12: 00005637b95f67c0 [ 95.150717] R13: 0000000000000000 R14: 00007fcf53dc6000 R15: 00005637bb2f18b0
Hi Sean Thanks a lot for your analysis. You did not mention when the L1TF GFN MMIO issue was introduce in Fedora so I assume it does not conflict with my saying that 4.18.7-200.fc28.x86_64 works fine for me with windows 7. Fedora 28 instead gives me this: [255239.184016] vmwrite error: reg 401e value 0 (err 12) [255239.184024] CPU: 0 PID: 17246 Comm: CPU 0/KVM Tainted: P OE 4.18.7-200.fc28.x86_64 #1 [255239.184026] Hardware name: System manufacturer System Product Name/P5B-Deluxe, BIOS 1238 09/30/2008 [255239.184027] Call Trace: [255239.184037] dump_stack+0x5c/0x80 [255239.184045] vmx_set_virtual_apic_mode+0x1c5/0x270 [kvm_intel] [255239.184079] ? kvm_make_all_cpus_request+0x4d/0x70 [kvm] [255239.184098] kvm_lapic_set_base+0x7b/0x190 [kvm] [255239.184116] kvm_set_apic_base+0xac/0xd0 [kvm] [255239.184135] kvm_set_msr_common+0x829/0xc30 [kvm] [255239.184140] ? vmx_vcpu_run+0x404/0xb80 [kvm_intel] [255239.184144] ? vmx_vcpu_run+0x410/0xb80 [kvm_intel] [255239.184148] ? vmx_vcpu_run+0x404/0xb80 [kvm_intel] [255239.184153] ? vmx_vcpu_run+0x410/0xb80 [kvm_intel] [255239.184157] ? vmx_vcpu_run+0x404/0xb80 [kvm_intel] [255239.184161] handle_wrmsr+0x55/0x140 [kvm_intel] [255239.184180] kvm_arch_vcpu_ioctl_run+0x56e/0x1a10 [kvm] [255239.184184] ? futex_wake+0x90/0x160 [255239.184187] ? __switch_to_asm+0x40/0x70 [255239.184189] ? __switch_to_asm+0x34/0x70 [255239.184190] ? __switch_to_asm+0x40/0x70 [255239.184192] ? __switch_to_asm+0x40/0x70 [255239.184207] kvm_vcpu_ioctl+0x2b8/0x5e0 [kvm] [255239.184210] ? __switch_to_asm+0x34/0x70 [255239.184211] ? __switch_to_asm+0x40/0x70 [255239.184213] ? __switch_to_asm+0x34/0x70 [255239.184214] ? __switch_to_asm+0x40/0x70 [255239.184216] ? __switch_to_asm+0x34/0x70 [255239.184217] ? __switch_to_asm+0x40/0x70 [255239.184219] ? __switch_to_asm+0x34/0x70 [255239.184220] ? __switch_to_asm+0x40/0x70 [255239.184222] ? __switch_to_asm+0x34/0x70 [255239.184223] ? __switch_to_asm+0x40/0x70 [255239.184226] do_vfs_ioctl+0xa4/0x620 [255239.184228] ksys_ioctl+0x60/0x90 [255239.184230] __x64_sys_ioctl+0x16/0x20 [255239.184233] do_syscall_64+0x5b/0x160 [255239.184235] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [255239.184238] RIP: 0033:0x7f7237138c57 [255239.184239] Code: 00 00 90 48 8b 05 49 82 2c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 19 82 2c 00 f7 d8 64 89 01 48 [255239.184273] RSP: 002b:00007f7227949648 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [255239.184276] RAX: ffffffffffffffda RBX: 00007f7240db2001 RCX: 00007f7237138c57 [255239.184277] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000016 [255239.184278] RBP: 0000000000000001 R08: 000055806b275790 R09: 0000000000000001 [255239.184279] R10: 0000000000000000 R11: 0000000000000246 R12: 000055806b25b7c0 [255239.184280] R13: 0000000000000000 R14: 00007f7240db1000 R15: 000055806bfe5240 [255239.297576] vmwrite error: reg 401e value 0 (err 12) [255239.297584] CPU: 1 PID: 17247 Comm: CPU 1/KVM Tainted: P OE 4.18.7-200.fc28.x86_64 #1 [255239.297585] Hardware name: System manufacturer System Product Name/P5B-Deluxe, BIOS 1238 09/30/2008 [255239.297586] Call Trace: [255239.297596] dump_stack+0x5c/0x80 [255239.297605] vmx_set_virtual_apic_mode+0x1c5/0x270 [kvm_intel] [255239.297645] ? kvm_make_all_cpus_request+0x4d/0x70 [kvm] [255239.297664] kvm_lapic_set_base+0x7b/0x190 [kvm] [255239.297682] kvm_set_apic_base+0xac/0xd0 [kvm] [255239.297701] kvm_set_msr_common+0x829/0xc30 [kvm] [255239.297706] ? vmx_vcpu_run+0x404/0xb80 [kvm_intel] [255239.297710] ? vmx_vcpu_run+0x410/0xb80 [kvm_intel] [255239.297714] ? vmx_vcpu_run+0x404/0xb80 [kvm_intel] [255239.297719] ? vmx_vcpu_run+0x410/0xb80 [kvm_intel] [255239.297723] ? vmx_vcpu_run+0x404/0xb80 [kvm_intel] [255239.297727] handle_wrmsr+0x55/0x140 [kvm_intel] [255239.297746] kvm_arch_vcpu_ioctl_run+0x56e/0x1a10 [kvm] [255239.297763] kvm_vcpu_ioctl+0x2b8/0x5e0 [kvm] [255239.297766] ? do_signal+0x1a8/0x610 [255239.297769] ? __fpu__restore_sig+0x75/0x440 [255239.297772] do_vfs_ioctl+0xa4/0x620 [255239.297774] ksys_ioctl+0x60/0x90 [255239.297776] __x64_sys_ioctl+0x16/0x20 [255239.297778] do_syscall_64+0x5b/0x160 [255239.297782] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [255239.297784] RIP: 0033:0x7f7237138c57 [255239.297785] Code: 00 00 90 48 8b 05 49 82 2c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 19 82 2c 00 f7 d8 64 89 01 48 [255239.297820] RSP: 002b:00007f7227148648 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [255239.297822] RAX: ffffffffffffffda RBX: 000055806c030920 RCX: 00007f7237138c57 [255239.297823] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000017 [255239.297824] RBP: 000055806b853a60 R08: 000055806b275790 R09: 000055806b884640 [255239.297825] R10: 0000000000000000 R11: 0000000000000246 R12: 000055806b25b7c0 [255239.297827] R13: 00007ffe19912c0f R14: 00007f7240c66000 R15: 000055806c030920 but it's not a steady stream of errors in dmesg like the old issue had. Fedora seems to run fine (if a bit slower than I remember). Is there a bug open for L1TF GFN MMIO or should I open one? Thanks
And of course by just fully reading https://www.spinics.net/lists/kvm/msg175634.html would have confirmed that it's: broken with 4.18.8+ kernels Thanks
Any news on when these issues will be fixed? Thanks
The fix has been added to the 4.18 and 4.14 stable trees, presumably it'll land in 4.18.14 and 4.14.76.
I installed 4.18.14 from updates-testing and I can confirm that it works on my PC.
I confirm it works too. Thanks !