Bug 2184623 - [RHV] Host Non-Operation after update Cluster CPU to Secure Intel Icelake Server. Missing CPU feature: taa-no
Summary: [RHV] Host Non-Operation after update Cluster CPU to Secure Intel Icelake Ser...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 4.5.3
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ovirt-4.5.3-async
: ---
Assignee: Lucia Jelinkova
QA Contact: Qin Yuan
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-04-05 08:53 UTC by José Enrique
Modified: 2023-06-21 19:54 UTC (History)
8 users (show)

Fixed In Version: ovirt-engine-4.5.3.8-2.el8ev
Doc Type: Bug Fix
Doc Text:
Previously, a host with Secure Intel Icelake Server Family could become non-operational because it did not provide the "taa-no" CPU feature. In this release, the check has been fixed in the Manager, and such hosts work properly.
Clone Of:
Environment:
Last Closed: 2023-06-21 19:54:24 UTC
oVirt Team: Virt
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github oVirt ovirt-engine pull 837 0 None Merged engine: Remove taa-no from Secure Skylake Server 2023-04-25 07:41:42 UTC
Red Hat Product Errata RHSA-2023:3771 0 None None None 2023-06-21 19:54:39 UTC

Description José Enrique 2023-04-05 08:53:11 UTC
Description of problem:
Host with an Intel Xeon Ice Lake goes to Non-Operation after update CPU Cluster to "Secure Intel Icelake Server"

Version-Release number of selected component (if applicable):

RHVM: 4.5.3.7

How reproducible:

RHVM: 4.5.3.7
Cluster compatibility version: 4.7
RHVH Host: 4.5.3

Upgrade CPU Type from Intel Icelake Server Family -> Secure Intel Icelake Server Family.

RHV-M put the host to "NonOperational" because of "host does not meet the cluster's minimum CPU level. Missing CPU features : taa-no"

Actual results:

Put Host in "NonOperational"

Expected results:

Enabling Secure Ice Lake cpu.

Additional info:

In RHV Manager, it is detected that the Host is not affected by the TAA but as a kernel feature not a CPU.

Comment 1 Klaas Demter 2023-04-05 11:35:06 UTC
Additional information: https://www.qemu.org/docs/master/system/qemu-cpu-models.html?highlight=taa-no

    Recommended to inform that the guest that the host is not vulnerable to CVE-2019-11135, TSX Asynchronous Abort (TAA).
    This too is an MSR feature, so it does not show up in the Linux /proc/cpuinfo in the host or guest.
    It should only be enabled for VMs if the host reports Not affected in the /sys/devices/system/cpu/vulnerabilities/tsx_async_abort file.


It seems like RHV is not able to understand that this is not a cpu flag.

Comment 2 Lucia Jelinkova 2023-04-11 11:16:59 UTC
RHV actually calls "cpu flags" a combination (set) of two things reported by the host:

1. the flags from /proc/cpuinfo
2. the features from virsh domcapabilities

The "taa-no" should be listed as a feature. Could you please provide us with the output of "virsh domcapabilities" command executed on the host?

Comment 3 Klaas Demter 2023-04-11 12:14:19 UTC
System A:
# virsh -c qemu:///system?authfile=/etc/ovirt-hosted-engine/virsh_auth.conf domcapabilities
<domainCapabilities>
  <path>/usr/libexec/qemu-kvm</path>
  <domain>kvm</domain>
  <machine>pc-i440fx-rhel7.6.0</machine>
  <arch>x86_64</arch>
  <vcpu max='240'/>
  <iothreads supported='yes'/>
  <os supported='yes'>
    <enum name='firmware'/>
    <loader supported='yes'>
      <value>/usr/share/OVMF/OVMF_CODE.secboot.fd</value>
      <enum name='type'>
        <value>rom</value>
        <value>pflash</value>
      </enum>
      <enum name='readonly'>
        <value>yes</value>
        <value>no</value>
      </enum>
      <enum name='secure'>
        <value>no</value>
      </enum>
    </loader>
  </os>
  <cpu>
    <mode name='host-passthrough' supported='yes'>
      <enum name='hostPassthroughMigratable'>
        <value>on</value>
        <value>off</value>
      </enum>
    </mode>
    <mode name='maximum' supported='yes'>
      <enum name='maximumMigratable'>
        <value>on</value>
        <value>off</value>
      </enum>
    </mode>
    <mode name='host-model' supported='yes'>
      <model fallback='forbid'>Icelake-Server</model>
      <vendor>Intel</vendor>
      <feature policy='require' name='ss'/>
      <feature policy='require' name='vmx'/>
      <feature policy='require' name='pdcm'/>
      <feature policy='require' name='hypervisor'/>
      <feature policy='require' name='tsc_adjust'/>
      <feature policy='require' name='avx512ifma'/>
      <feature policy='require' name='sha-ni'/>
      <feature policy='require' name='rdpid'/>
      <feature policy='require' name='fsrm'/>
      <feature policy='require' name='md-clear'/>
      <feature policy='require' name='stibp'/>
      <feature policy='require' name='arch-capabilities'/>
      <feature policy='require' name='xsaves'/>
      <feature policy='require' name='invtsc'/>
      <feature policy='require' name='ibpb'/>
      <feature policy='require' name='ibrs'/>
      <feature policy='require' name='amd-stibp'/>
      <feature policy='require' name='amd-ssbd'/>
      <feature policy='require' name='rdctl-no'/>
      <feature policy='require' name='ibrs-all'/>
      <feature policy='require' name='skip-l1dfl-vmentry'/>
      <feature policy='require' name='mds-no'/>
      <feature policy='require' name='pschange-mc-no'/>
      <feature policy='require' name='tsx-ctrl'/>
      <feature policy='disable' name='hle'/>
      <feature policy='disable' name='rtm'/>
      <feature policy='disable' name='mpx'/>
      <feature policy='disable' name='intel-pt'/>
    </mode>
    <mode name='custom' supported='yes'>
      <model usable='yes'>qemu64</model>
      <model usable='yes'>qemu32</model>
      <model usable='no'>phenom</model>
      <model usable='yes'>pentium3</model>
      <model usable='yes'>pentium2</model>
      <model usable='yes'>pentium</model>
      <model usable='yes'>n270</model>
      <model usable='yes'>kvm64</model>
      <model usable='yes'>kvm32</model>
      <model usable='yes'>coreduo</model>
      <model usable='yes'>core2duo</model>
      <model usable='no'>athlon</model>
      <model usable='yes'>Westmere-IBRS</model>
      <model usable='yes'>Westmere</model>
      <model usable='no'>Snowridge</model>
      <model usable='yes'>Skylake-Server-noTSX-IBRS</model>
      <model usable='no'>Skylake-Server-IBRS</model>
      <model usable='no'>Skylake-Server</model>
      <model usable='yes'>Skylake-Client-noTSX-IBRS</model>
      <model usable='no'>Skylake-Client-IBRS</model>
      <model usable='no'>Skylake-Client</model>
      <model usable='yes'>SandyBridge-IBRS</model>
      <model usable='yes'>SandyBridge</model>
      <model usable='yes'>Penryn</model>
      <model usable='no'>Opteron_G5</model>
      <model usable='no'>Opteron_G4</model>
      <model usable='no'>Opteron_G3</model>
      <model usable='yes'>Opteron_G2</model>
      <model usable='yes'>Opteron_G1</model>
      <model usable='yes'>Nehalem-IBRS</model>
      <model usable='yes'>Nehalem</model>
      <model usable='yes'>IvyBridge-IBRS</model>
      <model usable='yes'>IvyBridge</model>
      <model usable='yes'>Icelake-Server-noTSX</model>
      <model usable='no'>Icelake-Server</model>
      <model usable='yes'>Icelake-Client-noTSX</model>
      <model usable='no' deprecated='yes'>Icelake-Client</model>
      <model usable='yes'>Haswell-noTSX-IBRS</model>
      <model usable='yes'>Haswell-noTSX</model>
      <model usable='no'>Haswell-IBRS</model>
      <model usable='no'>Haswell</model>
      <model usable='no'>EPYC-Rome</model>
      <model usable='no'>EPYC-Milan</model>
      <model usable='no'>EPYC-IBPB</model>
      <model usable='no'>EPYC</model>
      <model usable='no'>Dhyana</model>
      <model usable='no'>Cooperlake</model>
      <model usable='yes'>Conroe</model>
      <model usable='yes'>Cascadelake-Server-noTSX</model>
      <model usable='no'>Cascadelake-Server</model>
      <model usable='yes'>Broadwell-noTSX-IBRS</model>
      <model usable='yes'>Broadwell-noTSX</model>
      <model usable='no'>Broadwell-IBRS</model>
      <model usable='no'>Broadwell</model>
      <model usable='yes'>486</model>
    </mode>
  </cpu>
  <memoryBacking supported='yes'>
    <enum name='sourceType'>
      <value>file</value>
      <value>anonymous</value>
      <value>memfd</value>
    </enum>
  </memoryBacking>
  <devices>
    <disk supported='yes'>
      <enum name='diskDevice'>
        <value>disk</value>
        <value>cdrom</value>
        <value>floppy</value>
        <value>lun</value>
      </enum>
      <enum name='bus'>
        <value>ide</value>
        <value>fdc</value>
        <value>scsi</value>
        <value>virtio</value>
        <value>usb</value>
        <value>sata</value>
      </enum>
      <enum name='model'>
        <value>virtio</value>
        <value>virtio-transitional</value>
        <value>virtio-non-transitional</value>
      </enum>
    </disk>
    <graphics supported='yes'>
      <enum name='type'>
        <value>vnc</value>
        <value>spice</value>
        <value>egl-headless</value>
      </enum>
    </graphics>
    <video supported='yes'>
      <enum name='modelType'>
        <value>vga</value>
        <value>cirrus</value>
        <value>qxl</value>
        <value>virtio</value>
        <value>none</value>
        <value>bochs</value>
        <value>ramfb</value>
      </enum>
    </video>
    <hostdev supported='yes'>
      <enum name='mode'>
        <value>subsystem</value>
      </enum>
      <enum name='startupPolicy'>
        <value>default</value>
        <value>mandatory</value>
        <value>requisite</value>
        <value>optional</value>
      </enum>
      <enum name='subsysType'>
        <value>usb</value>
        <value>pci</value>
        <value>scsi</value>
      </enum>
      <enum name='capsType'/>
      <enum name='pciBackend'/>
    </hostdev>
    <rng supported='yes'>
      <enum name='model'>
        <value>virtio</value>
        <value>virtio-transitional</value>
        <value>virtio-non-transitional</value>
      </enum>
      <enum name='backendModel'>
        <value>random</value>
        <value>egd</value>
        <value>builtin</value>
      </enum>
    </rng>
    <filesystem supported='yes'>
      <enum name='driverType'>
        <value>path</value>
        <value>handle</value>
        <value>virtiofs</value>
      </enum>
    </filesystem>
    <tpm supported='yes'>
      <enum name='model'>
        <value>tpm-tis</value>
        <value>tpm-crb</value>
      </enum>
      <enum name='backendModel'>
        <value>passthrough</value>
        <value>emulator</value>
      </enum>
    </tpm>
  </devices>
  <features>
    <gic supported='no'/>
    <vmcoreinfo supported='yes'/>
    <genid supported='yes'/>
    <backingStoreInput supported='yes'/>
    <backup supported='yes'/>
    <sev supported='no'/>
  </features>
</domainCapabilities>


# cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 106
model name      : Intel(R) Xeon(R) Silver 4310 CPU @ 2.10GHz
stepping        : 6
microcode       : 0xd000389
cpu MHz         : 3300.000
cache size      : 18432 KB
physical id     : 0
siblings        : 24
core id         : 0
cpu cores       : 12
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 27
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 invpcid_single intel_ppin ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb intel_pt avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local split_lock_detect wbnoinvd dtherm ida arat pln pts avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg tme avx512_vpopcntdq la57 rdpid fsrm md_clear pconfig flush_l1d arch_capabilities
bugs            : spectre_v1 spectre_v2 spec_store_bypass swapgs mmio_stale_data eibrs_pbrsb
bogomips        : 4200.00
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 57 bits virtual
power management:

[...]

# cat /sys/devices/system/cpu/vulnerabilities/tsx_async_abort
Not affected


System B:
# virsh -c qemu:///system?authfile=/etc/ovirt-hosted-engine/virsh_auth.conf domcapabilities
<domainCapabilities>
  <path>/usr/libexec/qemu-kvm</path>
  <domain>kvm</domain>
  <machine>pc-i440fx-rhel7.6.0</machine>
  <arch>x86_64</arch>
  <vcpu max='240'/>
  <iothreads supported='yes'/>
  <os supported='yes'>
    <enum name='firmware'/>
    <loader supported='yes'>
      <value>/usr/share/OVMF/OVMF_CODE.secboot.fd</value>
      <enum name='type'>
        <value>rom</value>
        <value>pflash</value>
      </enum>
      <enum name='readonly'>
        <value>yes</value>
        <value>no</value>
      </enum>
      <enum name='secure'>
        <value>no</value>
      </enum>
    </loader>
  </os>
  <cpu>
    <mode name='host-passthrough' supported='yes'>
      <enum name='hostPassthroughMigratable'>
        <value>on</value>
        <value>off</value>
      </enum>
    </mode>
    <mode name='maximum' supported='yes'>
      <enum name='maximumMigratable'>
        <value>on</value>
        <value>off</value>
      </enum>
    </mode>
    <mode name='host-model' supported='yes'>
      <model fallback='forbid'>Icelake-Server</model>
      <vendor>Intel</vendor>
      <feature policy='require' name='ss'/>
      <feature policy='require' name='vmx'/>
      <feature policy='require' name='pdcm'/>
      <feature policy='require' name='hypervisor'/>
      <feature policy='require' name='tsc_adjust'/>
      <feature policy='require' name='avx512ifma'/>
      <feature policy='require' name='sha-ni'/>
      <feature policy='require' name='rdpid'/>
      <feature policy='require' name='fsrm'/>
      <feature policy='require' name='md-clear'/>
      <feature policy='require' name='stibp'/>
      <feature policy='require' name='arch-capabilities'/>
      <feature policy='require' name='xsaves'/>
      <feature policy='require' name='invtsc'/>
      <feature policy='require' name='ibpb'/>
      <feature policy='require' name='ibrs'/>
      <feature policy='require' name='amd-stibp'/>
      <feature policy='require' name='amd-ssbd'/>
      <feature policy='require' name='rdctl-no'/>
      <feature policy='require' name='ibrs-all'/>
      <feature policy='require' name='skip-l1dfl-vmentry'/>
      <feature policy='require' name='mds-no'/>
      <feature policy='require' name='pschange-mc-no'/>
      <feature policy='require' name='tsx-ctrl'/>
      <feature policy='disable' name='hle'/>
      <feature policy='disable' name='rtm'/>
      <feature policy='disable' name='mpx'/>
      <feature policy='disable' name='intel-pt'/>
    </mode>
    <mode name='custom' supported='yes'>
      <model usable='yes'>qemu64</model>
      <model usable='yes'>qemu32</model>
      <model usable='no'>phenom</model>
      <model usable='yes'>pentium3</model>
      <model usable='yes'>pentium2</model>
      <model usable='yes'>pentium</model>
      <model usable='yes'>n270</model>
      <model usable='yes'>kvm64</model>
      <model usable='yes'>kvm32</model>
      <model usable='yes'>coreduo</model>
      <model usable='yes'>core2duo</model>
      <model usable='no'>athlon</model>
      <model usable='yes'>Westmere-IBRS</model>
      <model usable='yes'>Westmere</model>
      <model usable='no'>Snowridge</model>
      <model usable='yes'>Skylake-Server-noTSX-IBRS</model>
      <model usable='no'>Skylake-Server-IBRS</model>
      <model usable='no'>Skylake-Server</model>
      <model usable='yes'>Skylake-Client-noTSX-IBRS</model>
      <model usable='no'>Skylake-Client-IBRS</model>
      <model usable='no'>Skylake-Client</model>
      <model usable='yes'>SandyBridge-IBRS</model>
      <model usable='yes'>SandyBridge</model>
      <model usable='yes'>Penryn</model>
      <model usable='no'>Opteron_G5</model>
      <model usable='no'>Opteron_G4</model>
      <model usable='no'>Opteron_G3</model>
      <model usable='yes'>Opteron_G2</model>
      <model usable='yes'>Opteron_G1</model>
      <model usable='yes'>Nehalem-IBRS</model>
      <model usable='yes'>Nehalem</model>
      <model usable='yes'>IvyBridge-IBRS</model>
      <model usable='yes'>IvyBridge</model>
      <model usable='yes'>Icelake-Server-noTSX</model>
      <model usable='no'>Icelake-Server</model>
      <model usable='yes'>Icelake-Client-noTSX</model>
      <model usable='no' deprecated='yes'>Icelake-Client</model>
      <model usable='yes'>Haswell-noTSX-IBRS</model>
      <model usable='yes'>Haswell-noTSX</model>
      <model usable='no'>Haswell-IBRS</model>
      <model usable='no'>Haswell</model>
      <model usable='no'>EPYC-Rome</model>
      <model usable='no'>EPYC-Milan</model>
      <model usable='no'>EPYC-IBPB</model>
      <model usable='no'>EPYC</model>
      <model usable='no'>Dhyana</model>
      <model usable='no'>Cooperlake</model>
      <model usable='yes'>Conroe</model>
      <model usable='yes'>Cascadelake-Server-noTSX</model>
      <model usable='no'>Cascadelake-Server</model>
      <model usable='yes'>Broadwell-noTSX-IBRS</model>
      <model usable='yes'>Broadwell-noTSX</model>
      <model usable='no'>Broadwell-IBRS</model>
      <model usable='no'>Broadwell</model>
      <model usable='yes'>486</model>
    </mode>
  </cpu>
  <memoryBacking supported='yes'>
    <enum name='sourceType'>
      <value>file</value>
      <value>anonymous</value>
      <value>memfd</value>
    </enum>
  </memoryBacking>
  <devices>
    <disk supported='yes'>
      <enum name='diskDevice'>
        <value>disk</value>
        <value>cdrom</value>
        <value>floppy</value>
        <value>lun</value>
      </enum>
      <enum name='bus'>
        <value>ide</value>
        <value>fdc</value>
        <value>scsi</value>
        <value>virtio</value>
        <value>usb</value>
        <value>sata</value>
      </enum>
      <enum name='model'>
        <value>virtio</value>
        <value>virtio-transitional</value>
        <value>virtio-non-transitional</value>
      </enum>
    </disk>
    <graphics supported='yes'>
      <enum name='type'>
        <value>vnc</value>
        <value>spice</value>
        <value>egl-headless</value>
      </enum>
    </graphics>
    <video supported='yes'>
      <enum name='modelType'>
        <value>vga</value>
        <value>cirrus</value>
        <value>qxl</value>
        <value>virtio</value>
        <value>none</value>
        <value>bochs</value>
        <value>ramfb</value>
      </enum>
    </video>
    <hostdev supported='yes'>
      <enum name='mode'>
        <value>subsystem</value>
      </enum>
      <enum name='startupPolicy'>
        <value>default</value>
        <value>mandatory</value>
        <value>requisite</value>
        <value>optional</value>
      </enum>
      <enum name='subsysType'>
        <value>usb</value>
        <value>pci</value>
        <value>scsi</value>
      </enum>
      <enum name='capsType'/>
      <enum name='pciBackend'/>
    </hostdev>
    <rng supported='yes'>
      <enum name='model'>
        <value>virtio</value>
        <value>virtio-transitional</value>
        <value>virtio-non-transitional</value>
      </enum>
      <enum name='backendModel'>
        <value>random</value>
        <value>egd</value>
        <value>builtin</value>
      </enum>
    </rng>
    <filesystem supported='yes'>
      <enum name='driverType'>
        <value>path</value>
        <value>handle</value>
        <value>virtiofs</value>
      </enum>
    </filesystem>
    <tpm supported='yes'>
      <enum name='model'>
        <value>tpm-tis</value>
        <value>tpm-crb</value>
      </enum>
      <enum name='backendModel'>
        <value>passthrough</value>
        <value>emulator</value>
      </enum>
    </tpm>
  </devices>
  <features>
    <gic supported='no'/>
    <vmcoreinfo supported='yes'/>
    <genid supported='yes'/>
    <backingStoreInput supported='yes'/>
    <backup supported='yes'/>
    <sev supported='no'/>
  </features>
</domainCapabilities>


# cat /proc/cpuinfo 
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 106
model name      : Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz
stepping        : 6
microcode       : 0xd000389
cpu MHz         : 3400.000
cache size      : 49152 KB
physical id     : 0
siblings        : 64
core id         : 0
cpu cores       : 32
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 27
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 invpcid_single intel_ppin ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb intel_pt avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local split_lock_detect wbnoinvd dtherm ida arat pln pts avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg tme avx512_vpopcntdq la57 rdpid fsrm md_clear pconfig flush_l1d arch_capabilities
bugs            : spectre_v1 spectre_v2 spec_store_bypass swapgs mmio_stale_data eibrs_pbrsb
bogomips        : 5200.00
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 57 bits virtual
power management:

[...]

# cat /sys/devices/system/cpu/vulnerabilities/tsx_async_abort
Not affected

Comment 4 Lucia Jelinkova 2023-04-12 08:36:23 UTC
It seems the taa-no is not reported at all. However, the RHV should not update the CPU type if there was a host not compatible with the new type. Was the host perhaps added after the update? Or was there e.g. a new version of microcode, libvirt, or quemu installed meanwhile?

Are all hosts in the cluster non operational or just some of them?

Comment 5 Klaas Demter 2023-04-12 09:37:58 UTC
(In reply to Lucia Jelinkova from comment #4)
> It seems the taa-no is not reported at all. However, the RHV should not
> update the CPU type if there was a host not compatible with the new type.
> Was the host perhaps added after the update? Or was there e.g. a new version
> of microcode, libvirt, or quemu installed meanwhile?
> 
> Are all hosts in the cluster non operational or just some of them?

Hi,
no, this is about me wanting to set the secure version of the CPU.

Greetings
Klaas

Comment 6 Lucia Jelinkova 2023-04-14 12:40:14 UTC
It would still be beneficial if we could gain more information about what/how happened. That would help us to pinpoint the component that causes this - it could be microcode, qemu, libvirt or RHV itself.

Comment 7 Michal Skrivanek 2023-04-17 10:48:50 UTC
microcode has recently been updated for this CPU - https://github.com/intel/Intel-Linux-Processor-Microcode-Data-Files/releases/tag/microcode-20230214
could be that it changed the reported capabilities, do you have any that are not updated yet, i.e. with older version than d000389?

Maybe we don't need to require taa-no when we disable TSX, I can't recall why we do that, possibly for the cases where you want to enable TSX anyway?

Comment 8 Lucia Jelinkova 2023-04-17 11:22:22 UTC
We copied that configuration from Qemu's latest definition for Icelake Server (at that time it was version 3) to be aligned with them.

https://github.com/qemu/qemu/blob/6c938efc27c2c9c9b02d574d0522a83dc06c72c8/target/i386/cpu.c#L3602

Comment 9 Klaas Demter 2023-04-17 12:02:38 UTC
(In reply to Lucia Jelinkova from comment #6)
> It would still be beneficial if we could gain more information about
> what/how happened. That would help us to pinpoint the component that causes
> this - it could be microcode, qemu, libvirt or RHV itself.

It's a new RHV setup on new hardware because we have to move datacenters. So there is no single component that was changed that lead to this case. Also those are my first icelake cpus in rhv :)
So I can only say from a kernel point of view, those machines should be taa-no because:

# cat /sys/devices/system/cpu/vulnerabilities/tsx_async_abort
Not affected

Kernel recognizes it correctly I would say.

(In reply to Michal Skrivanek from comment #7)
> microcode has recently been updated for this CPU - https://github.com/intel/Intel-Linux-Processor-Microcode-Data-Files/releases/tag/microcode-20230214
> could be that it changed the reported capabilities, do you have any that are not updated yet, i.e. with older version than d000389?

same here, I think those servers already came with that, there is no microcode update during boot: "kernel: microcode: sig=0x606a6, pf=0x1, revision=0xd000389" is supplied by dell bios. I mean I could try to downgrade it, but seeing as kernel is correctly recognizing it, I don't think that's needed. It seems like libvirt/qemu does not correctly get the taa-no information from kernel.

Comment 10 Michal Skrivanek 2023-04-17 12:56:26 UTC
it's likely the v4 and probably the new microcode stopped reporting taa-no altogether since it doesn't' really make sense with TSX disabled. I'm not sur about all the other changes in v4 but possibly we can just drop the taa-no requirement and be done with it. Out of the box it should be always fine since we use the -noTSX model anyway.

Comment 11 Klaas Demter 2023-04-17 13:08:02 UTC
(In reply to Michal Skrivanek from comment #10)
> it's likely the v4 and probably the new microcode stopped reporting taa-no


but shouldn't then the kernel also not know about the state of it? I mean qemu and kernel should use the same way of detecting taa-no, right?

Comment 14 Michal Skrivanek 2023-05-02 11:52:52 UTC
taa reporting in kernel is done by https://github.com/torvalds/linux/blob/865fdb08197e657c59e74a35fa32362b12397f58/arch/x86/kernel/cpu/common.c#L1374, it will show not affected because both rtm and tsx-ctrl missing.
qemu is just reports the feature individually, if the MSR is not there it's not reporting no-taa, and then it's missing in oVirt's requirements for the CPU model.

It also could be it never really worked, it was added at a time when the mitigation didn't even exist and it was just supposed to be reported in future.

Comment 18 Qin Yuan 2023-06-12 01:38:12 UTC
Verified with:
ovirt-engine-4.5.3.8-2.el8ev.noarch

Steps:
1. Create a cluster with Intel Icelake Server Family cpu type
2. Add an Ice Lake host to the cluster
3. Upgrade the cluster cpu type to Secure Intel Icelake Server Family

Result:
The Ice Lake host status is up after the cluster cpu type is upgraded to Secure Intel Icelake Server Family.

Comment 20 errata-xmlrpc 2023-06-21 19:54:24 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat Virtualization security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:3771


Note You need to log in before you can comment on or make changes to this bug.