Bug 2140808 - Hyperv feature set to "enabled: false" prevents scheduling
Summary: Hyperv feature set to "enabled: false" prevents scheduling
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Virtualization
Version: 4.11.0
Hardware: Unspecified
OS: Unspecified
urgent
high
Target Milestone: ---
: 4.12.0
Assignee: Barak
QA Contact: zhe peng
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-11-07 21:38 UTC by Jenifer Abrams
Modified: 2023-09-19 04:29 UTC (History)
14 users (show)

Fixed In Version: hco-bundle-registry-container-v4.12.0-714
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-01-24 13:41:51 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github kubevirt kubevirt pull 8793 0 None Merged [release-0.58]Fix evmcs hyperv bug 2022-11-18 08:59:28 UTC
Github kubevirt kubevirt pull 8861 0 None open [release-0.58]Fix evmcs hyperv vendor bug 2022-11-27 09:53:29 UTC
Red Hat Issue Tracker CNV-22326 0 None None None 2022-11-15 18:46:20 UTC
Red Hat Product Errata RHSA-2023:0408 0 None None None 2023-01-24 13:41:58 UTC

Description Jenifer Abrams 2022-11-07 21:38:51 UTC
Description of problem:
When running on an AMD system, if a Windows VM is created using:
          hyperv:
            evmcs:
              enabled: false

scheduling will fail due to:
Node-Selectors:  cpu-feature.node.kubevirt.io/vmx=true
                 cpu-vendor.node.kubevirt.io/Intel=true

Version-Release number of selected component (if applicable):
4.11.0

How reproducible:
Every time

Steps to Reproduce:
1. Create a VM w/ evmcs explicitly set to false, selected to run on an AMD node
2. oc describe virt-launcher pod to see scheduling failure (node(s) didn't match Pod's node affinity/selector)

Actual results:
Virt-launcher pod cannot be scheduled. 

Expected results:
If the feature is explicitly disabled, it should not prevent scheduling.

Comment 6 Jaroslav Suchanek 2022-11-10 17:30:04 UTC
What are the version of libvirt components please? What is the output of the virsh domcapabilities? Tim, Jiri any idea what's going on here?

Comment 7 Jenifer Abrams 2022-11-10 18:10:58 UTC
CNV 4.11 virt-launcher pod uses:

$ rpm -qa | egrep 'qemu|libvirt'
libvirt-libs-8.0.0-5.module+el8.6.0+14495+7194fa43.x86_64
libvirt-daemon-driver-qemu-8.0.0-5.module+el8.6.0+14495+7194fa43.x86_64
qemu-kvm-core-6.2.0-11.module+el8.6.0+14712+f96656d3.x86_64
qemu-img-6.2.0-11.module+el8.6.0+14712+f96656d3.x86_64
qemu-kvm-common-6.2.0-11.module+el8.6.0+14712+f96656d3.x86_64
libvirt-daemon-8.0.0-5.module+el8.6.0+14495+7194fa43.x86_64
libvirt-client-8.0.0-5.module+el8.6.0+14495+7194fa43.x86_64
qemu-kvm-hw-usbredir-6.2.0-11.module+el8.6.0+14712+f96656d3.x86_64
ipxe-roms-qemu-20181214-9.git133f4c47.el8.noarch

The pod won't schedule when evmcs is listed (as disabled) in the VM yaml, but here is virsh domcapabilities from a virt-launcher pod running *without* evmcs listed:

$ virsh domcapabilities
Authorization not available. Check if polkit service is running or see debug message for more information.
<domainCapabilities>
  <path>/usr/libexec/qemu-kvm</path>
  <domain>kvm</domain>
  <machine>pc-i440fx-rhel7.6.0</machine>
  <arch>x86_64</arch>
  <vcpu max='240'/>
  <iothreads supported='yes'/>
  <os supported='yes'>
    <enum name='firmware'/>
    <loader supported='yes'>
      <value>/usr/share/OVMF/OVMF_CODE.secboot.fd</value>
      <enum name='type'>
        <value>rom</value>
        <value>pflash</value>
      </enum>
      <enum name='readonly'>
        <value>yes</value>
        <value>no</value>
      </enum>
      <enum name='secure'>
        <value>no</value>
      </enum>
    </loader>
  </os>
  <cpu>
    <mode name='host-passthrough' supported='yes'>
      <enum name='hostPassthroughMigratable'>
        <value>on</value>
        <value>off</value>
      </enum>
    </mode>
    <mode name='maximum' supported='yes'>
      <enum name='maximumMigratable'>
        <value>on</value>
        <value>off</value>
      </enum>
    </mode>
    <mode name='host-model' supported='yes'>
      <model fallback='forbid'>EPYC-Milan</model>
      <vendor>AMD</vendor>
      <feature policy='require' name='x2apic'/>
      <feature policy='require' name='tsc-deadline'/>
      <feature policy='require' name='hypervisor'/>
      <feature policy='require' name='tsc_adjust'/>
      <feature policy='require' name='avx512f'/>
      <feature policy='require' name='avx512dq'/>
      <feature policy='require' name='avx512ifma'/>
      <feature policy='require' name='avx512cd'/>
      <feature policy='require' name='avx512bw'/>
      <feature policy='require' name='avx512vl'/>
      <feature policy='require' name='avx512vbmi'/>
      <feature policy='require' name='avx512vbmi2'/>
      <feature policy='require' name='gfni'/>
      <feature policy='require' name='vaes'/>
      <feature policy='require' name='vpclmulqdq'/>
      <feature policy='require' name='avx512vnni'/>
      <feature policy='require' name='avx512bitalg'/>
      <feature policy='require' name='avx512-vpopcntdq'/>
      <feature policy='require' name='la57'/>
      <feature policy='require' name='spec-ctrl'/>
      <feature policy='require' name='stibp'/>
      <feature policy='require' name='arch-capabilities'/>
      <feature policy='require' name='ssbd'/>
      <feature policy='require' name='avx512-bf16'/>
      <feature policy='require' name='cmp_legacy'/>
      <feature policy='require' name='invtsc'/>
      <feature policy='require' name='virt-ssbd'/>
      <feature policy='require' name='rdctl-no'/>
      <feature policy='require' name='skip-l1dfl-vmentry'/>
      <feature policy='require' name='mds-no'/>
      <feature policy='require' name='pschange-mc-no'/>
    </mode>
    <mode name='custom' supported='yes'>
      <model usable='yes'>qemu64</model>
      <model usable='yes'>qemu32</model>
      <model usable='no'>phenom</model>
      <model usable='yes'>pentium3</model>
      <model usable='yes'>pentium2</model>
      <model usable='yes'>pentium</model>
      <model usable='no'>n270</model>
      <model usable='yes'>kvm64</model>
      <model usable='yes'>kvm32</model>
      <model usable='no'>coreduo</model>
      <model usable='no'>core2duo</model>
      <model usable='no'>athlon</model>
      <model usable='yes'>Westmere-IBRS</model>
      <model usable='yes'>Westmere</model>
      <model usable='no'>Snowridge</model>
      <model usable='yes'>Skylake-Server-noTSX-IBRS</model>
      <model usable='no'>Skylake-Server-IBRS</model>
      <model usable='no'>Skylake-Server</model>
      <model usable='yes'>Skylake-Client-noTSX-IBRS</model>
      <model usable='no'>Skylake-Client-IBRS</model>
      <model usable='no'>Skylake-Client</model>
      <model usable='yes'>SandyBridge-IBRS</model>
      <model usable='yes'>SandyBridge</model>
      <model usable='yes'>Penryn</model>
      <model usable='no'>Opteron_G5</model>
      <model usable='no'>Opteron_G4</model>
      <model usable='yes'>Opteron_G3</model>
      <model usable='yes'>Opteron_G2</model>
      <model usable='yes'>Opteron_G1</model>
      <model usable='yes'>Nehalem-IBRS</model>
      <model usable='yes'>Nehalem</model>
      <model usable='yes'>IvyBridge-IBRS</model>
      <model usable='yes'>IvyBridge</model>
      <model usable='yes'>Icelake-Server-noTSX</model>
      <model usable='no'>Icelake-Server</model>
      <model usable='yes'>Icelake-Client-noTSX</model>
      <model usable='no' deprecated='yes'>Icelake-Client</model>
      <model usable='yes'>Haswell-noTSX-IBRS</model>
      <model usable='yes'>Haswell-noTSX</model>
      <model usable='no'>Haswell-IBRS</model>
      <model usable='no'>Haswell</model>
      <model usable='yes'>EPYC-Rome</model>
      <model usable='yes'>EPYC-Milan</model>
      <model usable='yes'>EPYC-IBPB</model>
      <model usable='yes'>EPYC</model>
      <model usable='yes'>Dhyana</model>
      <model usable='no'>Cooperlake</model>
      <model usable='yes'>Conroe</model>
      <model usable='no'>Cascadelake-Server-noTSX</model>
      <model usable='no'>Cascadelake-Server</model>
      <model usable='yes'>Broadwell-noTSX-IBRS</model>
      <model usable='yes'>Broadwell-noTSX</model>
      <model usable='no'>Broadwell-IBRS</model>
      <model usable='no'>Broadwell</model>
      <model usable='yes'>486</model>
    </mode>
  </cpu>
  <memoryBacking supported='yes'>
    <enum name='sourceType'>
      <value>file</value>
      <value>anonymous</value>
      <value>memfd</value>
    </enum>
  </memoryBacking>
  <devices>
    <disk supported='yes'>
      <enum name='diskDevice'>
        <value>disk</value>
        <value>cdrom</value>
        <value>floppy</value>
        <value>lun</value>
      </enum>
      <enum name='bus'>
        <value>ide</value>
        <value>fdc</value>
        <value>scsi</value>
        <value>virtio</value>
        <value>usb</value>
        <value>sata</value>
      </enum>
      <enum name='model'>
        <value>virtio</value>
        <value>virtio-transitional</value>
        <value>virtio-non-transitional</value>
      </enum>
    </disk>
    <graphics supported='yes'>
      <enum name='type'>
        <value>vnc</value>
        <value>spice</value>
        <value>egl-headless</value>
      </enum>
    </graphics>
    <video supported='yes'>
      <enum name='modelType'>
        <value>vga</value>
        <value>cirrus</value>
        <value>virtio</value>
        <value>none</value>
        <value>bochs</value>
        <value>ramfb</value>
      </enum>
    </video>
    <hostdev supported='yes'>
      <enum name='mode'>
        <value>subsystem</value>
      </enum>
      <enum name='startupPolicy'>
        <value>default</value>
        <value>mandatory</value>
        <value>requisite</value>
        <value>optional</value>
      </enum>
      <enum name='subsysType'>
        <value>usb</value>
        <value>pci</value>
        <value>scsi</value>
      </enum>
      <enum name='capsType'/>
      <enum name='pciBackend'/>
    </hostdev>
    <rng supported='yes'>
      <enum name='model'>
        <value>virtio</value>
        <value>virtio-transitional</value>
        <value>virtio-non-transitional</value>
      </enum>
      <enum name='backendModel'>
        <value>random</value>
        <value>egd</value>
        <value>builtin</value>
      </enum>
    </rng>
    <filesystem supported='yes'>
      <enum name='driverType'>
        <value>path</value>
        <value>handle</value>
        <value>virtiofs</value>
      </enum>
    </filesystem>
    <tpm supported='yes'>
      <enum name='model'>
        <value>tpm-tis</value>
        <value>tpm-crb</value>
      </enum>
      <enum name='backendModel'>
        <value>passthrough</value>
        <value>emulator</value>
      </enum>
    </tpm>
  </devices>
  <features>
    <gic supported='no'/>
    <vmcoreinfo supported='yes'/>
    <genid supported='yes'/>
    <backingStoreInput supported='yes'/>
    <backup supported='yes'/>
    <sev supported='no'/>
  </features>
</domainCapabilities>

Comment 8 Jiri Denemark 2022-11-11 08:44:39 UTC
It's hard to tell without seeing what libvirt actions did all this result in.
But it looks like host-model CPU is used which looking at the domacapabilities
output does not enable vmx. And looking at the oc output above, I can see

    Cpu:
      Cores:  1
      Features:
        Name:    vmx
        Policy:  require
      Model:     host-model
      Sockets:   1
      Threads:   1

and

    cpu:
      cores: 1
      features:
      - name: vmx
        policy: require
      model: host-model
      sockets: 1
      threads: 1

which looks like the source for libvirt XML, i.e., something is explicitly
asking for host-model with vmx enabled. Because we wouldn't see "host-model"
there if this was a translation of the domain XML after the domain was started
(just before starting the domain libvirt would replace "host-model" with the
actual CPU definition from domain capabilities). Also if I understand
correctly the domain is not even allowed to start because of this.

In other words, from what I see (although the picture is unfortunately far
from complete) I'm pretty sure this issue lies somewhere above libvirt.

Comment 10 Kedar Bidarkar 2022-11-23 11:13:11 UTC
Though this was tested on AMD Nodes, we see the below entry on the virt-launcher pod.
nodeSelector:
  cpu-vendor.node.kubevirt.io/Intel: "true"

Hence scheduling is failing, when we add the below to "hyperv:" in vm spec.

evmcs:
  enabled: false

Tested with CNV v4.12.0-714.


Additional Info:

I think we only dropped the below from the nodeSelector
"cpu-feature.node.kubevirt.io/vmx=true"

Comment 11 Barak 2022-12-04 14:23:36 UTC
(In reply to Kedar Bidarkar from comment #10)
> Though this was tested on AMD Nodes, we see the below entry on the
> virt-launcher pod.
> nodeSelector:
>   cpu-vendor.node.kubevirt.io/Intel: "true"
> 
> Hence scheduling is failing, when we add the below to "hyperv:" in vm spec.
> 
> evmcs:
>   enabled: false
> 
> Tested with CNV v4.12.0-714.
> 
> 
> Additional Info:
> 
> I think we only dropped the below from the nodeSelector
> "cpu-feature.node.kubevirt.io/vmx=true"

A PR to handle the vendor label was merged:
https://github.com/kubevirt/kubevirt/pull/8861

Comment 13 zhe peng 2022-12-13 06:54:45 UTC
verify with build:
OCP-4.12.0-rc.4
CNV-v4.12.0-760

step:
1: prepare OCP+CNV in AMD cluster
2: create win10 vm
3: add below in vm yaml file:
...
       hyperv:
            evmcs:
              enabled: false
...
4: start vm 
5. windows will running and can be login.
6. check virt-launcher nodeSelector:
Node-Selectors:  hyperv.node.kubevirt.io/frequencies=true
                 hyperv.node.kubevirt.io/ipi=true
                 hyperv.node.kubevirt.io/reenlightenment=true
                 hyperv.node.kubevirt.io/reset=true
                 hyperv.node.kubevirt.io/runtime=true
                 hyperv.node.kubevirt.io/synic=true
                 hyperv.node.kubevirt.io/synictimer=true
                 hyperv.node.kubevirt.io/tlbflush=true
                 hyperv.node.kubevirt.io/vpindex=true
                 kubevirt.io/schedulable=true

Comment 14 Kedar Bidarkar 2022-12-14 16:54:20 UTC
VERIFIED with CNV-v4.12.0-760

Comment 18 errata-xmlrpc 2023-01-24 13:41:51 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Virtualization 4.12.0 Images security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:0408

Comment 19 Red Hat Bugzilla 2023-09-19 04:29:38 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days


Note You need to log in before you can comment on or make changes to this bug.