Bug 2151852

Summary: When doing a cpu-baseline between skylake and cascadelake, cascadelake is selected as baseline.
Product: Red Hat Enterprise Linux 9 Reporter: Jaroslav Suchanek <jsuchane>
Component: libvirtAssignee: Jiri Denemark <jdenemar>
libvirt sub component: General QA Contact: Luyao Huang <lhuang>
Status: CLOSED CURRENTRELEASE Docs Contact:
Severity: high    
Priority: high CC: chhu, cmayapka, dhill, dyuan, dzheng, fdeutsch, fgadkano, fjin, gveitmic, jdenemar, jocran, jsuchane, kchamart, lhuang, lmen, mkalinin, vasanth.mohanraj, virt-maint, xuzhang, yalzhang, ymankad
Version: 9.2Keywords: Triaged, Upstream, ZStream
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1851227
: 2152081 2152082 (view as bug list) Environment:
Last Closed: 2022-12-14 12:13:07 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version: 8.4.0
Embargoed:
Bug Depends On: 1851227    
Bug Blocks: 1897025, 2084030, 2084031, 2152081, 2152082    

Description Jaroslav Suchanek 2022-12-08 10:29:45 UTC
+++ This bug was initially created as a clone of Bug #1851227 +++

Description of problem:
When doing a cpu-baseline between skylake and cascadelake, cascadelake is selected as baseline:

    <cpu>
      <arch>x86_64</arch>
      <model>Skylake-Server-IBRS</model>
      <vendor>Intel</vendor>
      <microcode version='33554537'/>
      <counter name='tsc' frequency='2600005000' scaling='yes'/>
      <topology sockets='1' dies='1' cores='12' threads='2'/>
      <feature name='ds'/>
      <feature name='acpi'/>
      <feature name='ss'/>
      <feature name='ht'/>
      <feature name='tm'/>
      <feature name='pbe'/>
      <feature name='dtes64'/>
      <feature name='monitor'/>
      <feature name='ds_cpl'/>
      <feature name='vmx'/>
      <feature name='smx'/>
      <feature name='est'/>
      <feature name='tm2'/>
      <feature name='xtpr'/>
      <feature name='pdcm'/>
      <feature name='dca'/>
      <feature name='osxsave'/>
      <feature name='tsc_adjust'/>
      <feature name='cmt'/>
      <feature name='clflushopt'/>
      <feature name='intel-pt'/>
      <feature name='pku'/>
      <feature name='ospke'/>
      <feature name='md-clear'/>
      <feature name='stibp'/>
      <feature name='ssbd'/>
      <feature name='xsaves'/>
      <feature name='mbm_total'/>
      <feature name='mbm_local'/>
      <feature name='invtsc'/>
      <pages unit='KiB' size='4'/>
      <pages unit='KiB' size='2048'/>
      <pages unit='KiB' size='1048576'/>
    </cpu>

    <cpu>
      <arch>x86_64</arch>
      <model>Cascadelake-Server</model>
      <vendor>Intel</vendor>
      <microcode version='83886124'/>
      <counter name='tsc' frequency='2099999000' scaling='yes'/>
      <topology sockets='1' dies='1' cores='20' threads='2'/>
      <feature name='ds'/>
      <feature name='acpi'/>
      <feature name='ss'/>
      <feature name='ht'/>
      <feature name='tm'/>
      <feature name='pbe'/>
      <feature name='dtes64'/>
      <feature name='monitor'/>
      <feature name='ds_cpl'/>
      <feature name='vmx'/>
      <feature name='smx'/>
      <feature name='est'/>
      <feature name='tm2'/>
      <feature name='xtpr'/>
      <feature name='pdcm'/>
      <feature name='dca'/>
      <feature name='osxsave'/>
      <feature name='tsc_adjust'/>
      <feature name='cmt'/>
      <feature name='intel-pt'/>
      <feature name='pku'/>
      <feature name='ospke'/>
      <feature name='md-clear'/>
      <feature name='stibp'/>
      <feature name='arch-capabilities'/>
      <feature name='xsaves'/>
      <feature name='mbm_total'/>
      <feature name='mbm_local'/>
      <feature name='invtsc'/>
      <feature name='rdctl-no'/>
      <feature name='ibrs-all'/>
      <feature name='skip-l1dfl-vmentry'/>
      <feature name='mds-no'/>
      <feature name='tsx-ctrl'/>
      <pages unit='KiB' size='4'/>
      <pages unit='KiB' size='2048'/>
      <pages unit='KiB' size='1048576'/>
    </cpu>


[root@undercloud-0-rhosp16 ~]# virsh cpu-baseline allo5 
<cpu mode='custom' match='exact'>
  <model fallback='allow'>Cascadelake-Server</model>
  <vendor>Intel</vendor>
  <feature policy='require' name='ds'/>
  <feature policy='require' name='acpi'/>
  <feature policy='require' name='ss'/>
  <feature policy='require' name='ht'/>
  <feature policy='require' name='tm'/>
  <feature policy='require' name='pbe'/>
  <feature policy='require' name='dtes64'/>
  <feature policy='require' name='monitor'/>
  <feature policy='require' name='ds_cpl'/>
  <feature policy='require' name='vmx'/>
  <feature policy='require' name='smx'/>
  <feature policy='require' name='est'/>
  <feature policy='require' name='tm2'/>
  <feature policy='require' name='xtpr'/>
  <feature policy='require' name='pdcm'/>
  <feature policy='require' name='dca'/>
  <feature policy='require' name='tsc_adjust'/>
  <feature policy='require' name='intel-pt'/>
  <feature policy='require' name='pku'/>
  <feature policy='require' name='md-clear'/>
  <feature policy='require' name='stibp'/>
  <feature policy='require' name='xsaves'/>
  <feature policy='require' name='invtsc'/>
  <feature policy='disable' name='avx512vnni'/>
</cpu>


Version-Release number of selected component (if applicable):
4.5, 6.3 and 6.4

How reproducible:
Always

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:


--- Additional comment from Jiri Denemark on 2022-05-06 17:38:25 CEST ---

Fixed upstream by

commit 48341b025acdd04a66696a709c7b09b3bfd42acf
Refs: v8.3.0-42-g48341b025a
Author:     Jiri Denemark <jdenemar>
AuthorDate: Tue Apr 26 15:06:30 2022 +0200
Commit:     Jiri Denemark <jdenemar>
CommitDate: Fri May 6 17:33:47 2022 +0200

    cpu_x86: Penalize disabled features when computing CPU model

    For finding the best matching CPU model for a given set of features
    while we don't know the CPU signature (i.e., when computing a baseline
    CPU model) we've been using a "shortest list of features" heuristics.
    This works well if new CPU models are supersets of older models, but
    that's not always the case. As a result it may actually select a new CPU
    model as a baseline while removing some features from it to make it
    compatible with older models. This is in general worse than using an old
    CPU model with a bunch of added features as a guest OS or apps may crash
    when using features that were disabled.

    On the other hand we don't want to end up with a very old model which
    would guarantee no disabled features as it could stop a guest OS or apps
    from using some features provided by the CPU because they would not
    expect them on such an old CPU.

    This patch changes the heuristics to something in between. Enabled and
    disabled features are counted separately so that a CPU model requiring
    some features to be disabled looks worse than a model with fewer
    disabled features even if its complete list of features is longer. The
    penalty given for each additional disabled feature gets bigger to make
    longer list of disabled features look even worse.

    https://bugzilla.redhat.com/show_bug.cgi?id=1851227

    Signed-off-by: Jiri Denemark <jdenemar>
    Reviewed-by: Michal Privoznik <mprivozn>

commit bb6cedd2082599323257ee0df18c93a6e0551b0b
Refs: v8.3.0-43-gbb6cedd208
Author:     Jiri Denemark <jdenemar>
AuthorDate: Fri Apr 29 10:35:02 2022 +0200
Commit:     Jiri Denemark <jdenemar>
CommitDate: Fri May 6 17:33:47 2022 +0200

    cpu_x86: Ignore enabled features for input models in x86DecodeUseCandidate

    While we don't want to aim for the shortest list of disabled features in
    the baseline result (it would select a very old model), we want to do so
    while looking at any of the input models for which we're trying to
    compute a baseline CPU model. Given a set of input models, we always
    want to take the least capable one of them (i.e., the one with shortest
    list of disabled features) or a better model which is not one of the
    input models.

    So when considering an input model, we just check whether its list of
    disabled features is shorter than the currently best one. When looking
    at other models we check both enabled and disabled features while
    penalizing disabled features as implemented by the previous patch.

    Signed-off-by: Jiri Denemark <jdenemar>
    Reviewed-by: Michal Privoznik <mprivozn>

Comment 4 Jiri Denemark 2022-12-14 12:13:07 UTC
Already fixed in libvirt 8.5.0 which was released in RHEL 9.1.0.