Bug 1879646

Summary: nested AMD/(L0 VMWare) Virtualization fails in CNV 2.4.1 with cpu->kvm_msr_buf->nmsrs (0xe1)
Product: Container Native Virtualization (CNV) Reporter: Andrea Cervesato <acervesa>
Component: VirtualizationAssignee: sgott
Status: CLOSED CURRENTRELEASE QA Contact: Israel Pinto <ipinto>
Severity: high Docs Contact:
Priority: unspecified    
Version: 2.4.1CC: cnv-qe-bugs, dgilbert, fdeutsch, kbidarka, mlevitsk, oramraz, sgott
Target Milestone: ---Keywords: TestOnly
Target Release: 2.6.0Flags: ipinto: needinfo?
Hardware: x86_64   
OS: Unspecified   
Whiteboard:
Fixed In Version: virt-operator-container-v2.6.0-79 hco-bundle-registry-container-v2.6.0-287 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-03-10 13:47:34 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Andrea Cervesato 2020-09-16 17:32:50 UTC
Description of problem:
The virtualization does not work

Version-Release number of selected component (if applicable):
2.4.1

How reproducible:
Spawn any VM

Steps to Reproduce:
1.spawn any vm


Actual results:
The machine loops whith the following error
```
0s          Warning   SyncFailed                     virtualmachineinstance/vm-example                               (combined from similar events): server error. command SyncVMI failed: "LibvirtError(Code=1, Domain=10, Message='internal error: qemu unexpectedly closed the monitor: 2020-09-16T17:21:36.625311Z qemu-kvm: error: failed to set MSR 0xe1 to 0x0\nqemu-kvm: /builddir/build/BUILD/qemu-4.2.0/target/i386/kvm.c:2695: kvm_buf_set_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed.')"
0s          Warning   SyncFailed                     virtualmachineinstance/vm-example                               (combined from similar events): server error. command SyncVMI failed: "LibvirtError(Code=1, Domain=10, Message='internal error: process exited while connecting to monitor: 2020-09-16T17:21:37.420617Z qemu-kvm: error: failed to set MSR 0xe1 to 0x0\nqemu-kvm: /builddir/build/BUILD/qemu-4.2.0/target/i386/kvm.c:2695: kvm_buf_set_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed.')"
0s          Warning   SyncFailed                     virtualmachineinstance/vm-example                               (combined from similar events): server error. command SyncVMI failed: "LibvirtError(Code=1, Domain=10, Message='internal error: process exited while connecting to monitor: 2020-09-16T17:21:38.211343Z qemu-kvm: error: failed to set MSR 0xe1 to 0x0\nqemu-kvm: /builddir/build/BUILD/qemu-4.2.0/target/i386/kvm.c:2695: kvm_buf_set_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed.')"
0s          Warning   SyncFailed                     virtualmachineinstance/vm-example                               (combined from similar events): server error. command SyncVMI failed: "LibvirtError(Code=1, Domain=10, Message='internal error: qemu unexpectedly closed the monitor: 2020-09-16T17:21:39.013171Z qemu-kvm: error: failed to set MSR 0xe1 to 0x0\nqemu-kvm: /builddir/build/BUILD/qemu-4.2.0/target/i386/kvm.c:2695: kvm_buf_set_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed.')"
0s          Warning   SyncFailed                     virtualmachineinstance/vm-example                               (combined from similar events): server error. command SyncVMI failed: "LibvirtError(Code=1, Domain=10, Message='internal error: qemu unexpectedly closed the monitor: 2020-09-16T17:21:39.780053Z qemu-kvm: error: failed to set MSR 0xe1 to 0x0\nqemu-kvm: /builddir/build/BUILD/qemu-4.2.0/target/i386/kvm.c:2695: kvm_buf_set_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed.')"
0s          Warning   SyncFailed                     virtualmachineinstance/vm-example                               (combined from similar events): server error. command SyncVMI failed: "LibvirtError(Code=1, Domain=10, Message='internal error: process exited while connecting to monitor: 2020-09-16T17:21:40.612615Z qemu-kvm: error: failed to set MSR 0xe1 to 0x0\nqemu-kvm: /builddir/build/BUILD/qemu-4.2.0/target/i386/kvm.c:2695: kvm_buf_set_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed.')"
0s          Warning   SyncFailed                     virtualmachineinstance/vm-example                               (combined from similar events): server error. command SyncVMI failed: "LibvirtError(Code=1, Domain=10, Message='internal error: qemu unexpectedly closed the monitor: 2020-09-16T17:21:41.438211Z qemu-kvm: error: failed to set MSR 0xe1 to 0x0\nqemu-kvm: /builddir/build/BUILD/qemu-4.2.0/target/i386/kvm.c:2695: kvm_buf_set_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed.')"
0s          Warning   SyncFailed                     virtualmachineinstance/vm-example                               (combined from similar events): server error. command SyncVMI failed: "LibvirtError(Code=1, Domain=10, Message='internal error: process exited while connecting to monitor: 2020-09-16T17:21:42.026847Z qemu-kvm: error: failed to set MSR 0xe1 to 0x0\nqemu-kvm: /builddir/build/BUILD/qemu-4.2.0/target/i386/kvm.c:2695: kvm_buf_set_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed.')"
0s          Warning   SyncFailed                     virtualmachineinstance/vm-example                               (combined from similar events): server error. command SyncVMI failed: "LibvirtError(Code=1, Domain=10, Message='internal error: qemu unexpectedly closed the monitor: 2020-09-16T17:21:42.780289Z qemu-kvm: error: failed to set MSR 0xe1 to 0x0\nqemu-kvm: /builddir/build/BUILD/qemu-4.2.0/target/i386/kvm.c:2695: kvm_buf_set_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed.')"
0s          Warning   SyncFailed                     virtualmachineinstance/vm-example                               (combined from similar events): server error. command SyncVMI failed: "LibvirtError(Code=1, Domain=10, Message='internal error: qemu unexpectedly closed the monitor: 2020-09-16T17:21:43.551149Z qemu-kvm: error: failed to set MSR 0xe1 to 0x0\nqemu-kvm: /builddir/build/BUILD/qemu-4.2.0/target/i386/kvm.c:2695: kvm_buf_set_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed.')"
0s          Warning   SyncFailed                     virtualmachineinstance/vm-example                               (combined from similar events): server error. command SyncVMI failed: "LibvirtError(Code=1, Domain=10, Message='internal error: qemu unexpectedly closed the monitor: 2020-09-16T17:21:44.312084Z qemu-kvm: error: failed to set MSR 0xe1 to 0x0\nqemu-kvm: /builddir/build/BUILD/qemu-4.2.0/target/i386/kvm.c:2695: kvm_buf_set_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed.')"
```


Additional info:
Looks like a regression for a problem addressed here: 
https://access.redhat.com/errata/RHSA-2020:3194
https://bugzilla.redhat.com/show_bug.cgi?id=1847070



We still have the same problem on: 4.18.0-193.14.3.el8_2.x86_64.

# oc version
Server Version: 4.5.8
Kubernetes Version: v1.18.3+6c42de8

# OCP Node version and info
Operating System
Linux
OS Image
Red Hat Enterprise Linux CoreOS 45.82.202008290529-0 (Ootpa)
Architecture
AMD64
Kernel Version
4.18.0-193.14.3.el8_2.x86_64
Boot ID
91b5f4f6-93c4-44a1-a464-84656fc80a12
Container Runtime
cri-o://1.18.3-11.rhaos4.5.gite5bcc71.el8
Kubelet Version
v1.18.3+6c42de8
Kube-Proxy Version
v1.18.3+6c42de8

Comment 1 Dr. David Alan Gilbert 2020-09-16 17:46:50 UTC
I suspect this needs Maxim's upstream f4cfcd2d5aea4e96c5d483c476f3057b6b7baf6a
'KVM: x86: don't expose MSR_IA32_UMWAIT_CONTROL unconditionally'

Maxim: Do you know if that's in any of the downstream kernels yet?

Comment 3 Dr. David Alan Gilbert 2020-09-16 17:49:59 UTC
Andrea said this was hosted on VMWare as a nest:

  VMware ESXi, 6.7.0, 15160138
  KRPA-U16 Series
  AMD EPYC 7502P 32-Core Processor

Comment 19 Andrea Cervesato 2021-01-28 09:53:22 UTC
I've finally found the time to test 4.7 (rc4) and with:

```
Linux worker5.lab03.ipa.mylab.local 4.18.0-240.10.1.el8_3.x86_64 #1 SMP Wed Dec 16 03:30:52 EST 2020 x86_64 x86_64 x86_64 GNU/Linux
```

The problem is gone.

Comment 20 Andrea Cervesato 2021-01-28 09:53:35 UTC
I've finally found the time to test 4.6 (rc4) and with:

```
Linux worker5.lab03.ipa.mylab.local 4.18.0-240.10.1.el8_3.x86_64 #1 SMP Wed Dec 16 03:30:52 EST 2020 x86_64 x86_64 x86_64 GNU/Linux
```

The problem is gone.

Comment 21 Israel Pinto 2021-01-28 09:54:17 UTC
(In reply to Andrea Cervesato from comment #20)
> I've finally found the time to test 4.6 (rc4) and with:
> 
> ```
> Linux worker5.lab03.ipa.mylab.local 4.18.0-240.10.1.el8_3.x86_64 #1 SMP Wed
> Dec 16 03:30:52 EST 2020 x86_64 x86_64 x86_64 GNU/Linux
> ```
> 
> The problem is gone.

Great update Andrea moving to verify.

Comment 22 Dan Kenigsberg 2021-03-10 13:47:34 UTC
as part of advisory https://access.redhat.com/errata/RHSA-2021:0799