Bug 1632426 - Again - Running a VM causes thousands of vmwrite error: reg 401e value 2021 (err 12)
Summary: Again - Running a VM causes thousands of vmwrite error: reg 401e value 2021 (...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 28
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-09-24 18:50 UTC by Gerhard Wiesinger
Modified: 2018-10-20 14:33 UTC (History)
21 users (show)

Fixed In Version: kernel-4.18.14-200.fc28
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-10-18 14:00:02 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Gerhard Wiesinger 2018-09-24 18:50:08 UTC
Description of problem:
kernel panics and slowdowns
Regression of https://bugzilla.redhat.com/show_bug.cgi?id=1566258

Version-Release number of selected component (if applicable):
4.18.9-200.fc28.x86_64

How reproducible:
always

Steps to Reproduce:
1. Start a virtual machine under virt-manager (windows 7 or XP)
2. See it running slowly
3. Check dmesg and see zillions of warnings

Actual results:
vmwrite error: reg 401e value 0 (err 12)
CPU: 0 PID: 2994 Comm: CPU 0/KVM Not tainted 4.18.9-200.fc28.x86_64 #1
Call Trace:
 dump_stack+0x5c/0x80
 vmx_set_virtual_apic_mode+0x1c5/0x270 [kvm_intel]
 ? kvm_make_all_cpus_request+0x4d/0x70 [kvm]
 kvm_lapic_set_base+0x7b/0x190 [kvm]
 kvm_set_apic_base+0xac/0xd0 [kvm]
 kvm_set_msr_common+0x829/0xc30 [kvm]
 ? vmx_vcpu_run+0x404/0xb80 [kvm_intel]
 ? vmx_vcpu_run+0x410/0xb80 [kvm_intel]
 ? vmx_vcpu_run+0x404/0xb80 [kvm_intel]
 ? vmx_vcpu_run+0x410/0xb80 [kvm_intel]
 ? vmx_vcpu_run+0x404/0xb80 [kvm_intel]
 handle_wrmsr+0x55/0x140 [kvm_intel]
 kvm_arch_vcpu_ioctl_run+0x56e/0x1a10 [kvm]
 ? pollwake+0x74/0x90
 ? wake_up_q+0x70/0x70
 ? __wake_up_common+0x7a/0x190
 kvm_vcpu_ioctl+0x2b8/0x5e0 [kvm]
 ? wake_up_q+0x70/0x70
 do_vfs_ioctl+0xa4/0x620
 ksys_ioctl+0x60/0x90
 ? ksys_write+0x9c/0xb0
 __x64_sys_ioctl+0x16/0x20
 do_syscall_64+0x5b/0x160
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f0a0be6ec57


Expected results:
Runs fast with no warnings

Additional info:
Old processor, same one hit previously by kernel regression
vendor_id       : GenuineIntel
cpu family      : 6
model           : 15
model name      : Intel(R) Core(TM)2 Quad CPU           @ 2.66GHz

Comment 1 pzeppegno 2018-09-28 21:20:38 UTC
Let me add some more information here.

First of all the regression I see started in kernel-4.18.8-200.fc28.x86_64, the last working kernel is kernel-4.18.7-200.fc28.x86_64 that I am running now.

Secondarily the behavior I see is different. My windows 7 VM goes in pause state and trying to unpause it gives me the error:

libvirtError: internal error: unable to execute QEMU command 'cont': Resetting the Virtual Machine is required

Comment 2 Sean Christopherson 2018-10-01 13:21:15 UTC
The issue Gerhard is seeing is a new bug introduced by commit 8d860bbeedef ("kvm: vmx: Basic APIC virtualization controls have three settings").  Resolving that issue is straightforward, KVM just needs to check whether or not secondary execution controls (VMCS field 401e) is supported before writing said field.

Paolo's pause issue is likely something different.  The aforementioned commit exists in all 4.18 releases (https://lkml.org/lkml/2018/6/12/439), and while it could spam the kernel log if the guest VM constantly writes the APIC_BASE MSR, I don't see any way for it to result in a KVM internal error.

Comment 3 Sean Christopherson 2018-10-01 19:09:34 UTC
Paolo, your bug is most likely the L1TF GFN MMIO issue: https://www.spinics.net/lists/kvm/msg175634.html

Comment 4 Nicolas Berrehouc 2018-10-01 19:54:12 UTC
Same problem with kernel 4.18.10-200.fc28.x86 and CentOS 7 VM.
VM doesn't run and same errors in dmesg:

[   95.142808] vmwrite error: reg 401e value 0 (err 12)
[   95.143018] CPU: 0 PID: 1945 Comm: CPU 0/KVM Not tainted 4.18.10-200.fc28.x86_64 #1
[   95.143286] Hardware name: Dell Inc. PowerEdge 2900/0TM757, BIOS 2.7.0 10/30/2010
[   95.143535] Call Trace:
[   95.143654]  dump_stack+0x5c/0x80
[   95.143787]  vmx_set_virtual_apic_mode+0x1c5/0x270 [kvm_intel]
[   95.144076]  ? kvm_make_all_cpus_request+0x4d/0x70 [kvm]
[   95.144312]  kvm_lapic_set_base+0x7b/0x190 [kvm]
[   95.144524]  kvm_set_apic_base+0xac/0xd0 [kvm]
[   95.144717]  kvm_set_msr_common+0x829/0xc30 [kvm]
[   95.144892]  ? vmx_vcpu_run+0x404/0xb80 [kvm_intel]
[   95.145078]  ? vmx_vcpu_run+0x410/0xb80 [kvm_intel]
[   95.145256]  ? vmx_vcpu_run+0x404/0xb80 [kvm_intel]
[   95.145437]  ? vmx_vcpu_run+0x410/0xb80 [kvm_intel]
[   95.145626]  ? vmx_vcpu_run+0x404/0xb80 [kvm_intel]
[   95.145811]  handle_wrmsr+0x55/0x140 [kvm_intel]
[   95.146021]  kvm_arch_vcpu_ioctl_run+0x56e/0x1a10 [kvm]
[   95.146243]  kvm_vcpu_ioctl+0x2b8/0x5e0 [kvm]
[   95.146409]  ? __switch_to_asm+0x34/0x70
[   95.146555]  ? __switch_to_asm+0x40/0x70
[   95.146704]  ? __switch_to_asm+0x34/0x70
[   95.146856]  ? __switch_to_asm+0x40/0x70
[   95.147006]  ? __switch_to_asm+0x34/0x70
[   95.147151]  ? __switch_to_asm+0x40/0x70
[   95.147301]  ? __switch_to_asm+0x34/0x70
[   95.147452]  ? __switch_to_asm+0x40/0x70
[   95.147605]  ? __switch_to_asm+0x34/0x70
[   95.147765]  ? apic_timer_interrupt+0xa/0x20
[   95.147928]  do_vfs_ioctl+0xa4/0x620
[   95.148074]  ksys_ioctl+0x60/0x90
[   95.148191]  __x64_sys_ioctl+0x16/0x20
[   95.148332]  do_syscall_64+0x5b/0x160
[   95.148472]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   95.148652] RIP: 0033:0x7fcf4a10bc57
[   95.148789] Code: 00 00 90 48 8b 05 49 82 2c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 19 82 2c 00 f7 d8 64 89 01 48 
[   95.149496] RSP: 002b:00007fcf3ad32648 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[   95.149757] RAX: ffffffffffffffda RBX: 00007fcf53dc7002 RCX: 00007fcf4a10bc57
[   95.149995] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000017
[   95.150235] RBP: 0000000000000001 R08: 00005637b9610790 R09: 000000000000ffff
[   95.150477] R10: 00005637b95fa4f0 R11: 0000000000000246 R12: 00005637b95f67c0
[   95.150717] R13: 0000000000000000 R14: 00007fcf53dc6000 R15: 00005637bb2f18b0

Comment 5 pzeppegno 2018-10-01 20:15:16 UTC
Hi Sean
Thanks a lot for your analysis. You did not mention when the L1TF GFN MMIO issue was introduce in Fedora so I assume it does not conflict with my saying that 4.18.7-200.fc28.x86_64 works fine for me with windows 7. Fedora 28 instead gives me this:

[255239.184016] vmwrite error: reg 401e value 0 (err 12)
[255239.184024] CPU: 0 PID: 17246 Comm: CPU 0/KVM Tainted: P           OE     4.18.7-200.fc28.x86_64 #1
[255239.184026] Hardware name: System manufacturer System Product Name/P5B-Deluxe, BIOS 1238    09/30/2008
[255239.184027] Call Trace:
[255239.184037]  dump_stack+0x5c/0x80
[255239.184045]  vmx_set_virtual_apic_mode+0x1c5/0x270 [kvm_intel]
[255239.184079]  ? kvm_make_all_cpus_request+0x4d/0x70 [kvm]
[255239.184098]  kvm_lapic_set_base+0x7b/0x190 [kvm]
[255239.184116]  kvm_set_apic_base+0xac/0xd0 [kvm]
[255239.184135]  kvm_set_msr_common+0x829/0xc30 [kvm]
[255239.184140]  ? vmx_vcpu_run+0x404/0xb80 [kvm_intel]
[255239.184144]  ? vmx_vcpu_run+0x410/0xb80 [kvm_intel]
[255239.184148]  ? vmx_vcpu_run+0x404/0xb80 [kvm_intel]
[255239.184153]  ? vmx_vcpu_run+0x410/0xb80 [kvm_intel]
[255239.184157]  ? vmx_vcpu_run+0x404/0xb80 [kvm_intel]
[255239.184161]  handle_wrmsr+0x55/0x140 [kvm_intel]
[255239.184180]  kvm_arch_vcpu_ioctl_run+0x56e/0x1a10 [kvm]
[255239.184184]  ? futex_wake+0x90/0x160
[255239.184187]  ? __switch_to_asm+0x40/0x70
[255239.184189]  ? __switch_to_asm+0x34/0x70
[255239.184190]  ? __switch_to_asm+0x40/0x70
[255239.184192]  ? __switch_to_asm+0x40/0x70
[255239.184207]  kvm_vcpu_ioctl+0x2b8/0x5e0 [kvm]
[255239.184210]  ? __switch_to_asm+0x34/0x70
[255239.184211]  ? __switch_to_asm+0x40/0x70
[255239.184213]  ? __switch_to_asm+0x34/0x70
[255239.184214]  ? __switch_to_asm+0x40/0x70
[255239.184216]  ? __switch_to_asm+0x34/0x70
[255239.184217]  ? __switch_to_asm+0x40/0x70
[255239.184219]  ? __switch_to_asm+0x34/0x70
[255239.184220]  ? __switch_to_asm+0x40/0x70
[255239.184222]  ? __switch_to_asm+0x34/0x70
[255239.184223]  ? __switch_to_asm+0x40/0x70
[255239.184226]  do_vfs_ioctl+0xa4/0x620
[255239.184228]  ksys_ioctl+0x60/0x90
[255239.184230]  __x64_sys_ioctl+0x16/0x20
[255239.184233]  do_syscall_64+0x5b/0x160
[255239.184235]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[255239.184238] RIP: 0033:0x7f7237138c57
[255239.184239] Code: 00 00 90 48 8b 05 49 82 2c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 19 82 2c 00 f7 d8 64 89 01 48 
[255239.184273] RSP: 002b:00007f7227949648 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[255239.184276] RAX: ffffffffffffffda RBX: 00007f7240db2001 RCX: 00007f7237138c57
[255239.184277] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000016
[255239.184278] RBP: 0000000000000001 R08: 000055806b275790 R09: 0000000000000001
[255239.184279] R10: 0000000000000000 R11: 0000000000000246 R12: 000055806b25b7c0
[255239.184280] R13: 0000000000000000 R14: 00007f7240db1000 R15: 000055806bfe5240
[255239.297576] vmwrite error: reg 401e value 0 (err 12)
[255239.297584] CPU: 1 PID: 17247 Comm: CPU 1/KVM Tainted: P           OE     4.18.7-200.fc28.x86_64 #1
[255239.297585] Hardware name: System manufacturer System Product Name/P5B-Deluxe, BIOS 1238    09/30/2008
[255239.297586] Call Trace:
[255239.297596]  dump_stack+0x5c/0x80
[255239.297605]  vmx_set_virtual_apic_mode+0x1c5/0x270 [kvm_intel]
[255239.297645]  ? kvm_make_all_cpus_request+0x4d/0x70 [kvm]
[255239.297664]  kvm_lapic_set_base+0x7b/0x190 [kvm]
[255239.297682]  kvm_set_apic_base+0xac/0xd0 [kvm]
[255239.297701]  kvm_set_msr_common+0x829/0xc30 [kvm]
[255239.297706]  ? vmx_vcpu_run+0x404/0xb80 [kvm_intel]
[255239.297710]  ? vmx_vcpu_run+0x410/0xb80 [kvm_intel]
[255239.297714]  ? vmx_vcpu_run+0x404/0xb80 [kvm_intel]
[255239.297719]  ? vmx_vcpu_run+0x410/0xb80 [kvm_intel]
[255239.297723]  ? vmx_vcpu_run+0x404/0xb80 [kvm_intel]
[255239.297727]  handle_wrmsr+0x55/0x140 [kvm_intel]
[255239.297746]  kvm_arch_vcpu_ioctl_run+0x56e/0x1a10 [kvm]
[255239.297763]  kvm_vcpu_ioctl+0x2b8/0x5e0 [kvm]
[255239.297766]  ? do_signal+0x1a8/0x610
[255239.297769]  ? __fpu__restore_sig+0x75/0x440
[255239.297772]  do_vfs_ioctl+0xa4/0x620
[255239.297774]  ksys_ioctl+0x60/0x90
[255239.297776]  __x64_sys_ioctl+0x16/0x20
[255239.297778]  do_syscall_64+0x5b/0x160
[255239.297782]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[255239.297784] RIP: 0033:0x7f7237138c57
[255239.297785] Code: 00 00 90 48 8b 05 49 82 2c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 19 82 2c 00 f7 d8 64 89 01 48 
[255239.297820] RSP: 002b:00007f7227148648 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[255239.297822] RAX: ffffffffffffffda RBX: 000055806c030920 RCX: 00007f7237138c57
[255239.297823] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000017
[255239.297824] RBP: 000055806b853a60 R08: 000055806b275790 R09: 000055806b884640
[255239.297825] R10: 0000000000000000 R11: 0000000000000246 R12: 000055806b25b7c0
[255239.297827] R13: 00007ffe19912c0f R14: 00007f7240c66000 R15: 000055806c030920


but it's not a steady stream of errors in dmesg like the old issue had. Fedora seems to run fine (if a bit slower than I remember).

Is there a bug open for L1TF GFN MMIO or should I open one?

Thanks

Comment 6 pzeppegno 2018-10-01 20:24:49 UTC
And of course by just fully reading https://www.spinics.net/lists/kvm/msg175634.html would have confirmed that it's:

broken with 4.18.8+ kernels

Thanks

Comment 7 pzeppegno 2018-10-06 13:47:27 UTC
Any news on when these issues will be fixed?

Thanks

Comment 8 Sean Christopherson 2018-10-11 13:19:30 UTC
The fix has been added to the 4.18 and 4.14 stable trees, presumably it'll land in 4.18.14 and 4.14.76.

Comment 9 pzeppegno 2018-10-17 18:53:51 UTC
I installed 4.18.14 from updates-testing and I can confirm that it works on my PC.

Comment 10 Nicolas Berrehouc 2018-10-20 10:30:40 UTC
I confirm it works too. Thanks !

Comment 11 Gerhard Wiesinger 2018-10-20 14:33:29 UTC
I confirm it works too. Thanks !


Note You need to log in before you can comment on or make changes to this bug.