Bug 1689362 - ovirt does not respect domcapabilities
Summary: ovirt does not respect domcapabilities
Status: POST
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: Backend.Core
Version: 4.3.1
Hardware: x86_64
OS: Linux
unspecified
high with 1 vote vote
Target Milestone: ---
: ---
Assignee: nobody nobody
QA Contact: Liran Rotenberg
URL:
Whiteboard:
Keywords:
: 1689361 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-03-15 17:40 UTC by Hetz Ben Hamo
Modified: 2019-03-26 09:16 UTC (History)
4 users (show)

(edit)
Clone Of:
(edit)
Last Closed:


Attachments (Terms of Use)
engine log (201.38 KB, application/gzip)
2019-03-16 01:26 UTC, Hetz Ben Hamo
no flags Details
VDSM log (42.38 KB, application/gzip)
2019-03-16 01:26 UTC, Hetz Ben Hamo
no flags Details
Dump XML as requested (7.70 KB, application/zip)
2019-03-16 11:13 UTC, Hetz Ben Hamo
no flags Details
VDSM logs after running hosted-engine deployment (501.25 KB, application/zip)
2019-03-16 13:13 UTC, Hetz Ben Hamo
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
oVirt gerrit 98728 master POST nestedvt: enable the 'monitor' flag for AMD CPUs 2019-03-21 00:09 UTC

Description Hetz Ben Hamo 2019-03-15 17:40:54 UTC
(I'm not sure if the component or the team is correct. If not, please redirect it).

I'm trying to run oVirt in nested virtualization with AMD's various Zen/Zen+ based CPU's (Ryzen, Threadripper,EPYC).

In Nested Virtualization mode, when trying to create or launch a VM, it stops and complains that the "monitor" flag is missing.

Checking libvirt domcapabilities shows that indeed the monitor policy is "disabled" which is correct (checking against other virtualization solutions), but oVirt doesn't respect the domcapabilities.

Could someone please disable the monitor flag check? it cannot be enabled and it's not a bug in the CPU or KVM.

Comment 1 Ryan Barry 2019-03-16 00:16:15 UTC
*** Bug 1689361 has been marked as a duplicate of this bug. ***

Comment 2 Ryan Barry 2019-03-16 00:25:43 UTC
Please attach logs (engine.log, libvirt logs, qemu logs). We don't directly check for flags outside of setting a CPU model. Is this coming from qemu?

Is vdsm-hook-nestedvt in use?

Comment 3 Hetz Ben Hamo 2019-03-16 01:26 UTC
Created attachment 1544682 [details]
engine log

Comment 4 Hetz Ben Hamo 2019-03-16 01:26 UTC
Created attachment 1544683 [details]
VDSM log

Comment 5 Hetz Ben Hamo 2019-03-16 01:34:03 UTC
I don't see any libvirt or qemu logs. Where are they? I'm enclosing both vdsm and engine logs which shows the error.

Exact message is: 6464: error : virCPUx86Compare:1731 : the CPU is incompatible with host CPU: Host CPU does not provide required features: monitor

Output of virsh domcapabilities:

# virsh domcapabilities | grep mon
<feature policy='disable' name='monitor'/>

Comment 6 Hetz Ben Hamo 2019-03-16 01:44:39 UTC
Forgot to mention: yes, installed vdsm-hook-nestedvt and checked that that it appears in host hooks (it does).

Comment 7 Ryan Barry 2019-03-16 01:45:33 UTC
So, that message comes directly from libvirt.

Libvirt and qemu logs will be on the host the VM was scheduled on before it failed (likely to be the same host as the vdsm logs). Both are under /var/log/

Is vdsm-hook-nestedvt installed?

Comment 8 Hetz Ben Hamo 2019-03-16 01:54:03 UTC
yes, vdsm-hook-nestedvt installed and running.

/var/log/libvirt/qemu doesn't help much - it has the VM log which only shows:

cat test-client.log 
2019-03-16 00:59:38.003+0000: shutting down, reason=failed
2019-03-16 01:15:28.349+0000: shutting down, reason=failed
2019-03-16 01:28:57.159+0000: shutting down, reason=failed
2019-03-16 01:29:04.283+0000: shutting down, reason=failed
2019-03-16 01:29:51.729+0000: shutting down, reason=failed
2019-03-16 01:31:44.493+0000: shutting down, reason=failed

/var/log/qemu-ga is an empty directory.

tailing the journald when starting a VM shows:

מרץ 16 03:52:44 localhost.localdomain vdsm[10633]: WARN Attempting to add an existing net user: ovirtmgmt/a3b4d8de-f2d3-4272-843c-fba78751f481
מרץ 16 03:52:45 localhost.localdomain libvirtd[6416]: 2019-03-16 01:52:45.401+0000: 6466: error : virCPUx86Compare:1731 : the CPU is incompatible with host CPU: Host CPU does not provide required features: monitor
מרץ 16 03:52:45 localhost.localdomain vdsm[10633]: WARN File: /var/lib/libvirt/qemu/channels/a3b4d8de-f2d3-4272-843c-fba78751f481.ovirt-guest-agent.0 already removed
מרץ 16 03:52:45 localhost.localdomain vdsm[10633]: WARN Attempting to remove a non existing network: ovirtmgmt/a3b4d8de-f2d3-4272-843c-fba78751f481
מרץ 16 03:52:45 localhost.localdomain vdsm[10633]: WARN Attempting to remove a non existing net user: ovirtmgmt/a3b4d8de-f2d3-4272-843c-fba78751f481
מרץ 16 03:52:45 localhost.localdomain vdsm[10633]: WARN File: /var/lib/libvirt/qemu/channels/a3b4d8de-f2d3-4272-843c-fba78751f481.org.qemu.guest_agent.0 already removed

Comment 9 Michal Skrivanek 2019-03-16 05:27:53 UTC
Please attach /proc/cpuinfo from L0 host, and domcapabilities output. Then the same from your nested L1 host, plus its domain xml from libvirt. If you manage to start a nested guest manually, can you please also get qemu cmdline and cpuinfo from the L2 guest?

Comment 10 Hetz Ben Hamo 2019-03-16 11:12:12 UTC
As requested, I'm including a dump of l0 and l1 cpuinfo and dom capabilities.
I also include the ovirt-node1 dumpxml as well as centos7 dumpxml.

I found something very interesting:

On the host (Fedora 29 with Ryzen 7) I created a CentOS 7 nested guest and installed CentOS 7 below it (so: Fedora host -> Centos nested -> Centos guest without nest) - this works perfectly ok.

However - I launched the ovirt node 1 (latest - 4.3.1) as a guest with nested virtualization and I tried to launch a VM using virsh (Centos 7 guest, no nested) - it stops with the CPU error about monitor.

So, it seems that the problem related to the Node-NG-appliance which I installed as ovirt-node-1. On a standard CentOS geust with nested, everything works, no errors...

So, how can I find what causes it in the Node-NG?

Comment 11 Hetz Ben Hamo 2019-03-16 11:13 UTC
Created attachment 1544756 [details]
Dump XML as requested

Comment 12 Hetz Ben Hamo 2019-03-16 11:20:40 UTC
Just to make myself clear - all VM's were created on the host (Fedora 29) using virt-manager

Comment 13 Hetz Ben Hamo 2019-03-16 13:12:38 UTC
After researching further, I found the following issue:

I installed CentOS as L1 guest with nested virtualization, and added oVirt Repo, and started the hosted-engine deployment.

It creates the HE VM, launches it and it works well (I can access it by port 6900).

However, when it comes to the storage part, after giving it the NFS share and continuing deployment, it creates the new HE VM in the NFS, moving the data and then when it tries to launch the new VM - it goes up and down.

So while it does this up & down, I mounted manually my virtual machines and tried to launch a VM using virsh (using virsh create)

And .. surprise surprise: 

# virsh create nfs-server.xml
Please enter your authentication name: hetz
Please enter your password: 
error: Failed to create domain from nfs-server.xml
error: the CPU is incompatible with host CPU: Host CPU does not provide required features: monitor

Prior to deploying the HE on this VM, KVM inside the guest OS worked perfectly well with virsh. After the failed deployed - I got the above.

I'm enclosing the whole VDSM stuff as the ansible logs doesn't show anything relevant..

Comment 14 Hetz Ben Hamo 2019-03-16 13:13 UTC
Created attachment 1544820 [details]
VDSM logs after running hosted-engine deployment

Comment 15 Hetz Ben Hamo 2019-03-16 13:19:38 UTC
Update #3: When running HE as a stand alone VM (not deploying using the hosted-engine --deploy) and adding a nested VM as "node" - it creates the same issue on this new "node".

Hope this helps...

Comment 16 Ryan Barry 2019-03-17 17:54:05 UTC
Thanks, Hetz. I'll look at the logs tomorrow.

vdsm does try to do CPU detection and set a host model appropriately (including HE setups -- you would have been prompted for this as part of the deployment), but we may be missing something here...

Comment 17 Ryan Barry 2019-03-21 00:08:59 UTC
Confirmed, and I know for sure that this doesn't happen with nested Intel CPUs, since I use them regularly


Note You need to log in before you can comment on or make changes to this bug.