Bug 2140808

Summary: Hyperv feature set to "enabled: false" prevents scheduling
Product: Container Native Virtualization (CNV) Reporter: Jenifer Abrams <jhopper>
Component: VirtualizationAssignee: Barak <bmordeha>
Status: CLOSED ERRATA QA Contact: zhe peng <zpeng>
Severity: high Docs Contact:
Priority: urgent    
Version: 4.11.0CC: acardace, ailan, akrgupta, chayang, dgilbert, fdeutsch, jdenemar, jsuchane, kbidarka, mtessun, pelauter, sgott, twiederh, vkuznets
Target Milestone: ---   
Target Release: 4.12.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: hco-bundle-registry-container-v4.12.0-714 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-01-24 13:41:51 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jenifer Abrams 2022-11-07 21:38:51 UTC
Description of problem:
When running on an AMD system, if a Windows VM is created using:
          hyperv:
            evmcs:
              enabled: false

scheduling will fail due to:
Node-Selectors:  cpu-feature.node.kubevirt.io/vmx=true
                 cpu-vendor.node.kubevirt.io/Intel=true

Version-Release number of selected component (if applicable):
4.11.0

How reproducible:
Every time

Steps to Reproduce:
1. Create a VM w/ evmcs explicitly set to false, selected to run on an AMD node
2. oc describe virt-launcher pod to see scheduling failure (node(s) didn't match Pod's node affinity/selector)

Actual results:
Virt-launcher pod cannot be scheduled. 

Expected results:
If the feature is explicitly disabled, it should not prevent scheduling.

Comment 6 Jaroslav Suchanek 2022-11-10 17:30:04 UTC
What are the version of libvirt components please? What is the output of the virsh domcapabilities? Tim, Jiri any idea what's going on here?

Comment 7 Jenifer Abrams 2022-11-10 18:10:58 UTC
CNV 4.11 virt-launcher pod uses:

$ rpm -qa | egrep 'qemu|libvirt'
libvirt-libs-8.0.0-5.module+el8.6.0+14495+7194fa43.x86_64
libvirt-daemon-driver-qemu-8.0.0-5.module+el8.6.0+14495+7194fa43.x86_64
qemu-kvm-core-6.2.0-11.module+el8.6.0+14712+f96656d3.x86_64
qemu-img-6.2.0-11.module+el8.6.0+14712+f96656d3.x86_64
qemu-kvm-common-6.2.0-11.module+el8.6.0+14712+f96656d3.x86_64
libvirt-daemon-8.0.0-5.module+el8.6.0+14495+7194fa43.x86_64
libvirt-client-8.0.0-5.module+el8.6.0+14495+7194fa43.x86_64
qemu-kvm-hw-usbredir-6.2.0-11.module+el8.6.0+14712+f96656d3.x86_64
ipxe-roms-qemu-20181214-9.git133f4c47.el8.noarch

The pod won't schedule when evmcs is listed (as disabled) in the VM yaml, but here is virsh domcapabilities from a virt-launcher pod running *without* evmcs listed:

$ virsh domcapabilities
Authorization not available. Check if polkit service is running or see debug message for more information.
<domainCapabilities>
  <path>/usr/libexec/qemu-kvm</path>
  <domain>kvm</domain>
  <machine>pc-i440fx-rhel7.6.0</machine>
  <arch>x86_64</arch>
  <vcpu max='240'/>
  <iothreads supported='yes'/>
  <os supported='yes'>
    <enum name='firmware'/>
    <loader supported='yes'>
      <value>/usr/share/OVMF/OVMF_CODE.secboot.fd</value>
      <enum name='type'>
        <value>rom</value>
        <value>pflash</value>
      </enum>
      <enum name='readonly'>
        <value>yes</value>
        <value>no</value>
      </enum>
      <enum name='secure'>
        <value>no</value>
      </enum>
    </loader>
  </os>
  <cpu>
    <mode name='host-passthrough' supported='yes'>
      <enum name='hostPassthroughMigratable'>
        <value>on</value>
        <value>off</value>
      </enum>
    </mode>
    <mode name='maximum' supported='yes'>
      <enum name='maximumMigratable'>
        <value>on</value>
        <value>off</value>
      </enum>
    </mode>
    <mode name='host-model' supported='yes'>
      <model fallback='forbid'>EPYC-Milan</model>
      <vendor>AMD</vendor>
      <feature policy='require' name='x2apic'/>
      <feature policy='require' name='tsc-deadline'/>
      <feature policy='require' name='hypervisor'/>
      <feature policy='require' name='tsc_adjust'/>
      <feature policy='require' name='avx512f'/>
      <feature policy='require' name='avx512dq'/>
      <feature policy='require' name='avx512ifma'/>
      <feature policy='require' name='avx512cd'/>
      <feature policy='require' name='avx512bw'/>
      <feature policy='require' name='avx512vl'/>
      <feature policy='require' name='avx512vbmi'/>
      <feature policy='require' name='avx512vbmi2'/>
      <feature policy='require' name='gfni'/>
      <feature policy='require' name='vaes'/>
      <feature policy='require' name='vpclmulqdq'/>
      <feature policy='require' name='avx512vnni'/>
      <feature policy='require' name='avx512bitalg'/>
      <feature policy='require' name='avx512-vpopcntdq'/>
      <feature policy='require' name='la57'/>
      <feature policy='require' name='spec-ctrl'/>
      <feature policy='require' name='stibp'/>
      <feature policy='require' name='arch-capabilities'/>
      <feature policy='require' name='ssbd'/>
      <feature policy='require' name='avx512-bf16'/>
      <feature policy='require' name='cmp_legacy'/>
      <feature policy='require' name='invtsc'/>
      <feature policy='require' name='virt-ssbd'/>
      <feature policy='require' name='rdctl-no'/>
      <feature policy='require' name='skip-l1dfl-vmentry'/>
      <feature policy='require' name='mds-no'/>
      <feature policy='require' name='pschange-mc-no'/>
    </mode>
    <mode name='custom' supported='yes'>
      <model usable='yes'>qemu64</model>
      <model usable='yes'>qemu32</model>
      <model usable='no'>phenom</model>
      <model usable='yes'>pentium3</model>
      <model usable='yes'>pentium2</model>
      <model usable='yes'>pentium</model>
      <model usable='no'>n270</model>
      <model usable='yes'>kvm64</model>
      <model usable='yes'>kvm32</model>
      <model usable='no'>coreduo</model>
      <model usable='no'>core2duo</model>
      <model usable='no'>athlon</model>
      <model usable='yes'>Westmere-IBRS</model>
      <model usable='yes'>Westmere</model>
      <model usable='no'>Snowridge</model>
      <model usable='yes'>Skylake-Server-noTSX-IBRS</model>
      <model usable='no'>Skylake-Server-IBRS</model>
      <model usable='no'>Skylake-Server</model>
      <model usable='yes'>Skylake-Client-noTSX-IBRS</model>
      <model usable='no'>Skylake-Client-IBRS</model>
      <model usable='no'>Skylake-Client</model>
      <model usable='yes'>SandyBridge-IBRS</model>
      <model usable='yes'>SandyBridge</model>
      <model usable='yes'>Penryn</model>
      <model usable='no'>Opteron_G5</model>
      <model usable='no'>Opteron_G4</model>
      <model usable='yes'>Opteron_G3</model>
      <model usable='yes'>Opteron_G2</model>
      <model usable='yes'>Opteron_G1</model>
      <model usable='yes'>Nehalem-IBRS</model>
      <model usable='yes'>Nehalem</model>
      <model usable='yes'>IvyBridge-IBRS</model>
      <model usable='yes'>IvyBridge</model>
      <model usable='yes'>Icelake-Server-noTSX</model>
      <model usable='no'>Icelake-Server</model>
      <model usable='yes'>Icelake-Client-noTSX</model>
      <model usable='no' deprecated='yes'>Icelake-Client</model>
      <model usable='yes'>Haswell-noTSX-IBRS</model>
      <model usable='yes'>Haswell-noTSX</model>
      <model usable='no'>Haswell-IBRS</model>
      <model usable='no'>Haswell</model>
      <model usable='yes'>EPYC-Rome</model>
      <model usable='yes'>EPYC-Milan</model>
      <model usable='yes'>EPYC-IBPB</model>
      <model usable='yes'>EPYC</model>
      <model usable='yes'>Dhyana</model>
      <model usable='no'>Cooperlake</model>
      <model usable='yes'>Conroe</model>
      <model usable='no'>Cascadelake-Server-noTSX</model>
      <model usable='no'>Cascadelake-Server</model>
      <model usable='yes'>Broadwell-noTSX-IBRS</model>
      <model usable='yes'>Broadwell-noTSX</model>
      <model usable='no'>Broadwell-IBRS</model>
      <model usable='no'>Broadwell</model>
      <model usable='yes'>486</model>
    </mode>
  </cpu>
  <memoryBacking supported='yes'>
    <enum name='sourceType'>
      <value>file</value>
      <value>anonymous</value>
      <value>memfd</value>
    </enum>
  </memoryBacking>
  <devices>
    <disk supported='yes'>
      <enum name='diskDevice'>
        <value>disk</value>
        <value>cdrom</value>
        <value>floppy</value>
        <value>lun</value>
      </enum>
      <enum name='bus'>
        <value>ide</value>
        <value>fdc</value>
        <value>scsi</value>
        <value>virtio</value>
        <value>usb</value>
        <value>sata</value>
      </enum>
      <enum name='model'>
        <value>virtio</value>
        <value>virtio-transitional</value>
        <value>virtio-non-transitional</value>
      </enum>
    </disk>
    <graphics supported='yes'>
      <enum name='type'>
        <value>vnc</value>
        <value>spice</value>
        <value>egl-headless</value>
      </enum>
    </graphics>
    <video supported='yes'>
      <enum name='modelType'>
        <value>vga</value>
        <value>cirrus</value>
        <value>virtio</value>
        <value>none</value>
        <value>bochs</value>
        <value>ramfb</value>
      </enum>
    </video>
    <hostdev supported='yes'>
      <enum name='mode'>
        <value>subsystem</value>
      </enum>
      <enum name='startupPolicy'>
        <value>default</value>
        <value>mandatory</value>
        <value>requisite</value>
        <value>optional</value>
      </enum>
      <enum name='subsysType'>
        <value>usb</value>
        <value>pci</value>
        <value>scsi</value>
      </enum>
      <enum name='capsType'/>
      <enum name='pciBackend'/>
    </hostdev>
    <rng supported='yes'>
      <enum name='model'>
        <value>virtio</value>
        <value>virtio-transitional</value>
        <value>virtio-non-transitional</value>
      </enum>
      <enum name='backendModel'>
        <value>random</value>
        <value>egd</value>
        <value>builtin</value>
      </enum>
    </rng>
    <filesystem supported='yes'>
      <enum name='driverType'>
        <value>path</value>
        <value>handle</value>
        <value>virtiofs</value>
      </enum>
    </filesystem>
    <tpm supported='yes'>
      <enum name='model'>
        <value>tpm-tis</value>
        <value>tpm-crb</value>
      </enum>
      <enum name='backendModel'>
        <value>passthrough</value>
        <value>emulator</value>
      </enum>
    </tpm>
  </devices>
  <features>
    <gic supported='no'/>
    <vmcoreinfo supported='yes'/>
    <genid supported='yes'/>
    <backingStoreInput supported='yes'/>
    <backup supported='yes'/>
    <sev supported='no'/>
  </features>
</domainCapabilities>

Comment 8 Jiri Denemark 2022-11-11 08:44:39 UTC
It's hard to tell without seeing what libvirt actions did all this result in.
But it looks like host-model CPU is used which looking at the domacapabilities
output does not enable vmx. And looking at the oc output above, I can see

    Cpu:
      Cores:  1
      Features:
        Name:    vmx
        Policy:  require
      Model:     host-model
      Sockets:   1
      Threads:   1

and

    cpu:
      cores: 1
      features:
      - name: vmx
        policy: require
      model: host-model
      sockets: 1
      threads: 1

which looks like the source for libvirt XML, i.e., something is explicitly
asking for host-model with vmx enabled. Because we wouldn't see "host-model"
there if this was a translation of the domain XML after the domain was started
(just before starting the domain libvirt would replace "host-model" with the
actual CPU definition from domain capabilities). Also if I understand
correctly the domain is not even allowed to start because of this.

In other words, from what I see (although the picture is unfortunately far
from complete) I'm pretty sure this issue lies somewhere above libvirt.

Comment 10 Kedar Bidarkar 2022-11-23 11:13:11 UTC
Though this was tested on AMD Nodes, we see the below entry on the virt-launcher pod.
nodeSelector:
  cpu-vendor.node.kubevirt.io/Intel: "true"

Hence scheduling is failing, when we add the below to "hyperv:" in vm spec.

evmcs:
  enabled: false

Tested with CNV v4.12.0-714.


Additional Info:

I think we only dropped the below from the nodeSelector
"cpu-feature.node.kubevirt.io/vmx=true"

Comment 11 Barak 2022-12-04 14:23:36 UTC
(In reply to Kedar Bidarkar from comment #10)
> Though this was tested on AMD Nodes, we see the below entry on the
> virt-launcher pod.
> nodeSelector:
>   cpu-vendor.node.kubevirt.io/Intel: "true"
> 
> Hence scheduling is failing, when we add the below to "hyperv:" in vm spec.
> 
> evmcs:
>   enabled: false
> 
> Tested with CNV v4.12.0-714.
> 
> 
> Additional Info:
> 
> I think we only dropped the below from the nodeSelector
> "cpu-feature.node.kubevirt.io/vmx=true"

A PR to handle the vendor label was merged:
https://github.com/kubevirt/kubevirt/pull/8861

Comment 13 zhe peng 2022-12-13 06:54:45 UTC
verify with build:
OCP-4.12.0-rc.4
CNV-v4.12.0-760

step:
1: prepare OCP+CNV in AMD cluster
2: create win10 vm
3: add below in vm yaml file:
...
       hyperv:
            evmcs:
              enabled: false
...
4: start vm 
5. windows will running and can be login.
6. check virt-launcher nodeSelector:
Node-Selectors:  hyperv.node.kubevirt.io/frequencies=true
                 hyperv.node.kubevirt.io/ipi=true
                 hyperv.node.kubevirt.io/reenlightenment=true
                 hyperv.node.kubevirt.io/reset=true
                 hyperv.node.kubevirt.io/runtime=true
                 hyperv.node.kubevirt.io/synic=true
                 hyperv.node.kubevirt.io/synictimer=true
                 hyperv.node.kubevirt.io/tlbflush=true
                 hyperv.node.kubevirt.io/vpindex=true
                 kubevirt.io/schedulable=true

Comment 14 Kedar Bidarkar 2022-12-14 16:54:20 UTC
VERIFIED with CNV-v4.12.0-760

Comment 18 errata-xmlrpc 2023-01-24 13:41:51 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Virtualization 4.12.0 Images security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:0408

Comment 19 Red Hat Bugzilla 2023-09-19 04:29:38 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days