Bug 1632426
| Summary: | Again - Running a VM causes thousands of vmwrite error: reg 401e value 2021 (err 12) | ||
|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | Gerhard Wiesinger <redhat> |
| Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
| Severity: | urgent | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 28 | CC: | airlied, bskeggs, ewk, fedora-kernel-kvm, hdegoede, ichavero, itamar, jarodwilson, jcline, jglisse, john.j5live, jonathan, josef, kernel-maint, linville, mchehab, mjg59, nberrehouc, pzeppegno, sean.j.christopherson, steved |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | kernel-4.18.14-200.fc28 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2018-10-18 14:00:02 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Gerhard Wiesinger
2018-09-24 18:50:08 UTC
Let me add some more information here. First of all the regression I see started in kernel-4.18.8-200.fc28.x86_64, the last working kernel is kernel-4.18.7-200.fc28.x86_64 that I am running now. Secondarily the behavior I see is different. My windows 7 VM goes in pause state and trying to unpause it gives me the error: libvirtError: internal error: unable to execute QEMU command 'cont': Resetting the Virtual Machine is required The issue Gerhard is seeing is a new bug introduced by commit 8d860bbeedef ("kvm: vmx: Basic APIC virtualization controls have three settings"). Resolving that issue is straightforward, KVM just needs to check whether or not secondary execution controls (VMCS field 401e) is supported before writing said field.
Paolo's pause issue is likely something different. The aforementioned commit exists in all 4.18 releases (https://lkml.org/lkml/2018/6/12/439), and while it could spam the kernel log if the guest VM constantly writes the APIC_BASE MSR, I don't see any way for it to result in a KVM internal error.
Paolo, your bug is most likely the L1TF GFN MMIO issue: https://www.spinics.net/lists/kvm/msg175634.html Same problem with kernel 4.18.10-200.fc28.x86 and CentOS 7 VM. VM doesn't run and same errors in dmesg: [ 95.142808] vmwrite error: reg 401e value 0 (err 12) [ 95.143018] CPU: 0 PID: 1945 Comm: CPU 0/KVM Not tainted 4.18.10-200.fc28.x86_64 #1 [ 95.143286] Hardware name: Dell Inc. PowerEdge 2900/0TM757, BIOS 2.7.0 10/30/2010 [ 95.143535] Call Trace: [ 95.143654] dump_stack+0x5c/0x80 [ 95.143787] vmx_set_virtual_apic_mode+0x1c5/0x270 [kvm_intel] [ 95.144076] ? kvm_make_all_cpus_request+0x4d/0x70 [kvm] [ 95.144312] kvm_lapic_set_base+0x7b/0x190 [kvm] [ 95.144524] kvm_set_apic_base+0xac/0xd0 [kvm] [ 95.144717] kvm_set_msr_common+0x829/0xc30 [kvm] [ 95.144892] ? vmx_vcpu_run+0x404/0xb80 [kvm_intel] [ 95.145078] ? vmx_vcpu_run+0x410/0xb80 [kvm_intel] [ 95.145256] ? vmx_vcpu_run+0x404/0xb80 [kvm_intel] [ 95.145437] ? vmx_vcpu_run+0x410/0xb80 [kvm_intel] [ 95.145626] ? vmx_vcpu_run+0x404/0xb80 [kvm_intel] [ 95.145811] handle_wrmsr+0x55/0x140 [kvm_intel] [ 95.146021] kvm_arch_vcpu_ioctl_run+0x56e/0x1a10 [kvm] [ 95.146243] kvm_vcpu_ioctl+0x2b8/0x5e0 [kvm] [ 95.146409] ? __switch_to_asm+0x34/0x70 [ 95.146555] ? __switch_to_asm+0x40/0x70 [ 95.146704] ? __switch_to_asm+0x34/0x70 [ 95.146856] ? __switch_to_asm+0x40/0x70 [ 95.147006] ? __switch_to_asm+0x34/0x70 [ 95.147151] ? __switch_to_asm+0x40/0x70 [ 95.147301] ? __switch_to_asm+0x34/0x70 [ 95.147452] ? __switch_to_asm+0x40/0x70 [ 95.147605] ? __switch_to_asm+0x34/0x70 [ 95.147765] ? apic_timer_interrupt+0xa/0x20 [ 95.147928] do_vfs_ioctl+0xa4/0x620 [ 95.148074] ksys_ioctl+0x60/0x90 [ 95.148191] __x64_sys_ioctl+0x16/0x20 [ 95.148332] do_syscall_64+0x5b/0x160 [ 95.148472] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 95.148652] RIP: 0033:0x7fcf4a10bc57 [ 95.148789] Code: 00 00 90 48 8b 05 49 82 2c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 19 82 2c 00 f7 d8 64 89 01 48 [ 95.149496] RSP: 002b:00007fcf3ad32648 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 95.149757] RAX: ffffffffffffffda RBX: 00007fcf53dc7002 RCX: 00007fcf4a10bc57 [ 95.149995] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000017 [ 95.150235] RBP: 0000000000000001 R08: 00005637b9610790 R09: 000000000000ffff [ 95.150477] R10: 00005637b95fa4f0 R11: 0000000000000246 R12: 00005637b95f67c0 [ 95.150717] R13: 0000000000000000 R14: 00007fcf53dc6000 R15: 00005637bb2f18b0 Hi Sean Thanks a lot for your analysis. You did not mention when the L1TF GFN MMIO issue was introduce in Fedora so I assume it does not conflict with my saying that 4.18.7-200.fc28.x86_64 works fine for me with windows 7. Fedora 28 instead gives me this: [255239.184016] vmwrite error: reg 401e value 0 (err 12) [255239.184024] CPU: 0 PID: 17246 Comm: CPU 0/KVM Tainted: P OE 4.18.7-200.fc28.x86_64 #1 [255239.184026] Hardware name: System manufacturer System Product Name/P5B-Deluxe, BIOS 1238 09/30/2008 [255239.184027] Call Trace: [255239.184037] dump_stack+0x5c/0x80 [255239.184045] vmx_set_virtual_apic_mode+0x1c5/0x270 [kvm_intel] [255239.184079] ? kvm_make_all_cpus_request+0x4d/0x70 [kvm] [255239.184098] kvm_lapic_set_base+0x7b/0x190 [kvm] [255239.184116] kvm_set_apic_base+0xac/0xd0 [kvm] [255239.184135] kvm_set_msr_common+0x829/0xc30 [kvm] [255239.184140] ? vmx_vcpu_run+0x404/0xb80 [kvm_intel] [255239.184144] ? vmx_vcpu_run+0x410/0xb80 [kvm_intel] [255239.184148] ? vmx_vcpu_run+0x404/0xb80 [kvm_intel] [255239.184153] ? vmx_vcpu_run+0x410/0xb80 [kvm_intel] [255239.184157] ? vmx_vcpu_run+0x404/0xb80 [kvm_intel] [255239.184161] handle_wrmsr+0x55/0x140 [kvm_intel] [255239.184180] kvm_arch_vcpu_ioctl_run+0x56e/0x1a10 [kvm] [255239.184184] ? futex_wake+0x90/0x160 [255239.184187] ? __switch_to_asm+0x40/0x70 [255239.184189] ? __switch_to_asm+0x34/0x70 [255239.184190] ? __switch_to_asm+0x40/0x70 [255239.184192] ? __switch_to_asm+0x40/0x70 [255239.184207] kvm_vcpu_ioctl+0x2b8/0x5e0 [kvm] [255239.184210] ? __switch_to_asm+0x34/0x70 [255239.184211] ? __switch_to_asm+0x40/0x70 [255239.184213] ? __switch_to_asm+0x34/0x70 [255239.184214] ? __switch_to_asm+0x40/0x70 [255239.184216] ? __switch_to_asm+0x34/0x70 [255239.184217] ? __switch_to_asm+0x40/0x70 [255239.184219] ? __switch_to_asm+0x34/0x70 [255239.184220] ? __switch_to_asm+0x40/0x70 [255239.184222] ? __switch_to_asm+0x34/0x70 [255239.184223] ? __switch_to_asm+0x40/0x70 [255239.184226] do_vfs_ioctl+0xa4/0x620 [255239.184228] ksys_ioctl+0x60/0x90 [255239.184230] __x64_sys_ioctl+0x16/0x20 [255239.184233] do_syscall_64+0x5b/0x160 [255239.184235] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [255239.184238] RIP: 0033:0x7f7237138c57 [255239.184239] Code: 00 00 90 48 8b 05 49 82 2c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 19 82 2c 00 f7 d8 64 89 01 48 [255239.184273] RSP: 002b:00007f7227949648 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [255239.184276] RAX: ffffffffffffffda RBX: 00007f7240db2001 RCX: 00007f7237138c57 [255239.184277] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000016 [255239.184278] RBP: 0000000000000001 R08: 000055806b275790 R09: 0000000000000001 [255239.184279] R10: 0000000000000000 R11: 0000000000000246 R12: 000055806b25b7c0 [255239.184280] R13: 0000000000000000 R14: 00007f7240db1000 R15: 000055806bfe5240 [255239.297576] vmwrite error: reg 401e value 0 (err 12) [255239.297584] CPU: 1 PID: 17247 Comm: CPU 1/KVM Tainted: P OE 4.18.7-200.fc28.x86_64 #1 [255239.297585] Hardware name: System manufacturer System Product Name/P5B-Deluxe, BIOS 1238 09/30/2008 [255239.297586] Call Trace: [255239.297596] dump_stack+0x5c/0x80 [255239.297605] vmx_set_virtual_apic_mode+0x1c5/0x270 [kvm_intel] [255239.297645] ? kvm_make_all_cpus_request+0x4d/0x70 [kvm] [255239.297664] kvm_lapic_set_base+0x7b/0x190 [kvm] [255239.297682] kvm_set_apic_base+0xac/0xd0 [kvm] [255239.297701] kvm_set_msr_common+0x829/0xc30 [kvm] [255239.297706] ? vmx_vcpu_run+0x404/0xb80 [kvm_intel] [255239.297710] ? vmx_vcpu_run+0x410/0xb80 [kvm_intel] [255239.297714] ? vmx_vcpu_run+0x404/0xb80 [kvm_intel] [255239.297719] ? vmx_vcpu_run+0x410/0xb80 [kvm_intel] [255239.297723] ? vmx_vcpu_run+0x404/0xb80 [kvm_intel] [255239.297727] handle_wrmsr+0x55/0x140 [kvm_intel] [255239.297746] kvm_arch_vcpu_ioctl_run+0x56e/0x1a10 [kvm] [255239.297763] kvm_vcpu_ioctl+0x2b8/0x5e0 [kvm] [255239.297766] ? do_signal+0x1a8/0x610 [255239.297769] ? __fpu__restore_sig+0x75/0x440 [255239.297772] do_vfs_ioctl+0xa4/0x620 [255239.297774] ksys_ioctl+0x60/0x90 [255239.297776] __x64_sys_ioctl+0x16/0x20 [255239.297778] do_syscall_64+0x5b/0x160 [255239.297782] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [255239.297784] RIP: 0033:0x7f7237138c57 [255239.297785] Code: 00 00 90 48 8b 05 49 82 2c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 19 82 2c 00 f7 d8 64 89 01 48 [255239.297820] RSP: 002b:00007f7227148648 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [255239.297822] RAX: ffffffffffffffda RBX: 000055806c030920 RCX: 00007f7237138c57 [255239.297823] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000017 [255239.297824] RBP: 000055806b853a60 R08: 000055806b275790 R09: 000055806b884640 [255239.297825] R10: 0000000000000000 R11: 0000000000000246 R12: 000055806b25b7c0 [255239.297827] R13: 00007ffe19912c0f R14: 00007f7240c66000 R15: 000055806c030920 but it's not a steady stream of errors in dmesg like the old issue had. Fedora seems to run fine (if a bit slower than I remember). Is there a bug open for L1TF GFN MMIO or should I open one? Thanks And of course by just fully reading https://www.spinics.net/lists/kvm/msg175634.html would have confirmed that it's: broken with 4.18.8+ kernels Thanks Any news on when these issues will be fixed? Thanks The fix has been added to the 4.18 and 4.14 stable trees, presumably it'll land in 4.18.14 and 4.14.76. I installed 4.18.14 from updates-testing and I can confirm that it works on my PC. I confirm it works too. Thanks ! I confirm it works too. Thanks ! |