Bug 1879646 - nested AMD/(L0 VMWare) Virtualization fails in CNV 2.4.1 with cpu->kvm_msr_buf->nmsrs (0xe1) [NEEDINFO]
Summary: nested AMD/(L0 VMWare) Virtualization fails in CNV 2.4.1 with cpu->kvm_msr_bu...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Virtualization
Version: 2.4.1
Hardware: x86_64
OS: Unspecified
unspecified
high
Target Milestone: ---
: 2.6.0
Assignee: sgott
QA Contact: Israel Pinto
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-09-16 17:32 UTC by Andrea Cervesato
Modified: 2021-03-10 13:47 UTC (History)
7 users (show)

Fixed In Version: virt-operator-container-v2.6.0-79 hco-bundle-registry-container-v2.6.0-287
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-03-10 13:47:34 UTC
Target Upstream Version:
Embargoed:
ipinto: needinfo?


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1847070 0 urgent CLOSED vmi cannot be scheduled , qemu-kvm core dump 2021-02-22 00:41:40 UTC
Red Hat Product Errata RHSA-2020:3194 0 None None None 2020-09-16 17:32:50 UTC

Description Andrea Cervesato 2020-09-16 17:32:50 UTC
Description of problem:
The virtualization does not work

Version-Release number of selected component (if applicable):
2.4.1

How reproducible:
Spawn any VM

Steps to Reproduce:
1.spawn any vm


Actual results:
The machine loops whith the following error
```
0s          Warning   SyncFailed                     virtualmachineinstance/vm-example                               (combined from similar events): server error. command SyncVMI failed: "LibvirtError(Code=1, Domain=10, Message='internal error: qemu unexpectedly closed the monitor: 2020-09-16T17:21:36.625311Z qemu-kvm: error: failed to set MSR 0xe1 to 0x0\nqemu-kvm: /builddir/build/BUILD/qemu-4.2.0/target/i386/kvm.c:2695: kvm_buf_set_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed.')"
0s          Warning   SyncFailed                     virtualmachineinstance/vm-example                               (combined from similar events): server error. command SyncVMI failed: "LibvirtError(Code=1, Domain=10, Message='internal error: process exited while connecting to monitor: 2020-09-16T17:21:37.420617Z qemu-kvm: error: failed to set MSR 0xe1 to 0x0\nqemu-kvm: /builddir/build/BUILD/qemu-4.2.0/target/i386/kvm.c:2695: kvm_buf_set_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed.')"
0s          Warning   SyncFailed                     virtualmachineinstance/vm-example                               (combined from similar events): server error. command SyncVMI failed: "LibvirtError(Code=1, Domain=10, Message='internal error: process exited while connecting to monitor: 2020-09-16T17:21:38.211343Z qemu-kvm: error: failed to set MSR 0xe1 to 0x0\nqemu-kvm: /builddir/build/BUILD/qemu-4.2.0/target/i386/kvm.c:2695: kvm_buf_set_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed.')"
0s          Warning   SyncFailed                     virtualmachineinstance/vm-example                               (combined from similar events): server error. command SyncVMI failed: "LibvirtError(Code=1, Domain=10, Message='internal error: qemu unexpectedly closed the monitor: 2020-09-16T17:21:39.013171Z qemu-kvm: error: failed to set MSR 0xe1 to 0x0\nqemu-kvm: /builddir/build/BUILD/qemu-4.2.0/target/i386/kvm.c:2695: kvm_buf_set_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed.')"
0s          Warning   SyncFailed                     virtualmachineinstance/vm-example                               (combined from similar events): server error. command SyncVMI failed: "LibvirtError(Code=1, Domain=10, Message='internal error: qemu unexpectedly closed the monitor: 2020-09-16T17:21:39.780053Z qemu-kvm: error: failed to set MSR 0xe1 to 0x0\nqemu-kvm: /builddir/build/BUILD/qemu-4.2.0/target/i386/kvm.c:2695: kvm_buf_set_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed.')"
0s          Warning   SyncFailed                     virtualmachineinstance/vm-example                               (combined from similar events): server error. command SyncVMI failed: "LibvirtError(Code=1, Domain=10, Message='internal error: process exited while connecting to monitor: 2020-09-16T17:21:40.612615Z qemu-kvm: error: failed to set MSR 0xe1 to 0x0\nqemu-kvm: /builddir/build/BUILD/qemu-4.2.0/target/i386/kvm.c:2695: kvm_buf_set_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed.')"
0s          Warning   SyncFailed                     virtualmachineinstance/vm-example                               (combined from similar events): server error. command SyncVMI failed: "LibvirtError(Code=1, Domain=10, Message='internal error: qemu unexpectedly closed the monitor: 2020-09-16T17:21:41.438211Z qemu-kvm: error: failed to set MSR 0xe1 to 0x0\nqemu-kvm: /builddir/build/BUILD/qemu-4.2.0/target/i386/kvm.c:2695: kvm_buf_set_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed.')"
0s          Warning   SyncFailed                     virtualmachineinstance/vm-example                               (combined from similar events): server error. command SyncVMI failed: "LibvirtError(Code=1, Domain=10, Message='internal error: process exited while connecting to monitor: 2020-09-16T17:21:42.026847Z qemu-kvm: error: failed to set MSR 0xe1 to 0x0\nqemu-kvm: /builddir/build/BUILD/qemu-4.2.0/target/i386/kvm.c:2695: kvm_buf_set_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed.')"
0s          Warning   SyncFailed                     virtualmachineinstance/vm-example                               (combined from similar events): server error. command SyncVMI failed: "LibvirtError(Code=1, Domain=10, Message='internal error: qemu unexpectedly closed the monitor: 2020-09-16T17:21:42.780289Z qemu-kvm: error: failed to set MSR 0xe1 to 0x0\nqemu-kvm: /builddir/build/BUILD/qemu-4.2.0/target/i386/kvm.c:2695: kvm_buf_set_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed.')"
0s          Warning   SyncFailed                     virtualmachineinstance/vm-example                               (combined from similar events): server error. command SyncVMI failed: "LibvirtError(Code=1, Domain=10, Message='internal error: qemu unexpectedly closed the monitor: 2020-09-16T17:21:43.551149Z qemu-kvm: error: failed to set MSR 0xe1 to 0x0\nqemu-kvm: /builddir/build/BUILD/qemu-4.2.0/target/i386/kvm.c:2695: kvm_buf_set_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed.')"
0s          Warning   SyncFailed                     virtualmachineinstance/vm-example                               (combined from similar events): server error. command SyncVMI failed: "LibvirtError(Code=1, Domain=10, Message='internal error: qemu unexpectedly closed the monitor: 2020-09-16T17:21:44.312084Z qemu-kvm: error: failed to set MSR 0xe1 to 0x0\nqemu-kvm: /builddir/build/BUILD/qemu-4.2.0/target/i386/kvm.c:2695: kvm_buf_set_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed.')"
```


Additional info:
Looks like a regression for a problem addressed here: 
https://access.redhat.com/errata/RHSA-2020:3194
https://bugzilla.redhat.com/show_bug.cgi?id=1847070



We still have the same problem on: 4.18.0-193.14.3.el8_2.x86_64.

# oc version
Server Version: 4.5.8
Kubernetes Version: v1.18.3+6c42de8

# OCP Node version and info
Operating System
Linux
OS Image
Red Hat Enterprise Linux CoreOS 45.82.202008290529-0 (Ootpa)
Architecture
AMD64
Kernel Version
4.18.0-193.14.3.el8_2.x86_64
Boot ID
91b5f4f6-93c4-44a1-a464-84656fc80a12
Container Runtime
cri-o://1.18.3-11.rhaos4.5.gite5bcc71.el8
Kubelet Version
v1.18.3+6c42de8
Kube-Proxy Version
v1.18.3+6c42de8

Comment 1 Dr. David Alan Gilbert 2020-09-16 17:46:50 UTC
I suspect this needs Maxim's upstream f4cfcd2d5aea4e96c5d483c476f3057b6b7baf6a
'KVM: x86: don't expose MSR_IA32_UMWAIT_CONTROL unconditionally'

Maxim: Do you know if that's in any of the downstream kernels yet?

Comment 3 Dr. David Alan Gilbert 2020-09-16 17:49:59 UTC
Andrea said this was hosted on VMWare as a nest:

  VMware ESXi, 6.7.0, 15160138
  KRPA-U16 Series
  AMD EPYC 7502P 32-Core Processor

Comment 19 Andrea Cervesato 2021-01-28 09:53:22 UTC
I've finally found the time to test 4.7 (rc4) and with:

```
Linux worker5.lab03.ipa.mylab.local 4.18.0-240.10.1.el8_3.x86_64 #1 SMP Wed Dec 16 03:30:52 EST 2020 x86_64 x86_64 x86_64 GNU/Linux
```

The problem is gone.

Comment 20 Andrea Cervesato 2021-01-28 09:53:35 UTC
I've finally found the time to test 4.6 (rc4) and with:

```
Linux worker5.lab03.ipa.mylab.local 4.18.0-240.10.1.el8_3.x86_64 #1 SMP Wed Dec 16 03:30:52 EST 2020 x86_64 x86_64 x86_64 GNU/Linux
```

The problem is gone.

Comment 21 Israel Pinto 2021-01-28 09:54:17 UTC
(In reply to Andrea Cervesato from comment #20)
> I've finally found the time to test 4.6 (rc4) and with:
> 
> ```
> Linux worker5.lab03.ipa.mylab.local 4.18.0-240.10.1.el8_3.x86_64 #1 SMP Wed
> Dec 16 03:30:52 EST 2020 x86_64 x86_64 x86_64 GNU/Linux
> ```
> 
> The problem is gone.

Great update Andrea moving to verify.

Comment 22 Dan Kenigsberg 2021-03-10 13:47:34 UTC
as part of advisory https://access.redhat.com/errata/RHSA-2021:0799


Note You need to log in before you can comment on or make changes to this bug.