Bug 1687515

Summary: Enhance detection of host CPU model to avoid guesses based on fea.ture list length [rhel-7.6.z]
Product: Red Hat Enterprise Linux 7 Reporter: RAD team bot copy to z-stream <autobot-eus-copy>
Component: libvirtAssignee: Jiri Denemark <jdenemar>
Status: CLOSED ERRATA QA Contact: jiyan <jiyan>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 7.6CC: dyuan, fjin, gveitmic, hfukumot, jdenemar, lhuang, lmen, mvanderw, xuzhang, yalzhang
Target Milestone: rcKeywords: ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: libvirt-4.5.0-10.el7_6.7 Doc Type: Bug Fix
Doc Text:
Cause: Some CPUs were incorrectly detected as a different CPU model, e.g., some Broadwell CPUs were detected as Skylake-Client CPU model. Consequence: VMs started with host-model CPUs on a Broadwell host could be started with Skylake-Client CPU model, which could cause noticeable slowdown in some workloads. Fix: The host CPU model detection algorithm was enhanced to cover more CPU signatures (family and model numbers) found in physical CPUs. The algorithm uses the list of real world CPU signatures to find the appropriate CPU model regardless on specific features which a particular CPU supports. Result: All Broadwell CPUs should now be correctly detected as some of the variants of Broadwell CPU model eliminating the slowdown for VMs using host-model CPU.
Story Points: ---
Clone Of: 1558558 Environment:
Last Closed: 2019-04-23 14:29:07 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1558558    
Bug Blocks:    

Description RAD team bot copy to z-stream 2019-03-11 16:18:44 UTC
This bug has been copied from bug #1558558 and has been proposed to be backported to 7.6 z-stream (EUS).

Comment 10 jiyan 2019-03-28 03:48:38 UTC
Version:
libvirt-4.5.0-10.el7_6.7.x86_64
qemu-kvm-rhev-2.12.0-18.el7_6.3.x86_64
kernel-3.10.0-957.el7.x86_64

Steps:
1. Check the output of 'virsh capabilities' and 'virsh domcapabilities' of 'Broadwell' physical machine.
# lscpu 
...
Model name:            Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz

# virsh capabilities
...
  <host>
    <uuid>4f11c612-e27d-11e7-9a7d-0894ef59df54</uuid>
    <cpu>
      <arch>x86_64</arch>
      <model>Broadwell</model>
      <vendor>Intel</vendor>
      <microcode version='184549410'/>
      <topology sockets='1' cores='10' threads='2'/>
      <feature name='vme'/>
...

# virsh domcapabilities
...
  <cpu>
    <mode name='host-passthrough' supported='yes'/>
    <mode name='host-model' supported='yes'>
      <model fallback='forbid'>Broadwell</model>
      <vendor>Intel</vendor>
      <feature policy='require' name='vme'/>
      <feature policy='require' name='ss'/>
      <feature policy='require' name='f16c'/>
      <feature policy='require' name='rdrand'/>
      <feature policy='require' name='hypervisor'/>
      <feature policy='require' name='arat'/>
      <feature policy='require' name='tsc_adjust'/>
      <feature policy='require' name='xsaveopt'/>
      <feature policy='require' name='pdpe1gb'/>
      <feature policy='require' name='abm'/>
      <feature policy='require' name='invtsc'/>
    </mode>
...

2. Start VM with 'custom' cpu conf, start VM
# virsh domstate avocado-vt-vm1
shut off

# virsh dumpxml avocado-vt-vm1 |grep "<cpu" -A3
  <cpu mode='custom' match='exact' check='partial'>
    <model fallback='allow'>Broadwell</model>
  </cpu>

# virsh start avocado-vt-vm1
Domain avocado-vt-vm1 started

3. After Step-2, check the CPU conf in active dumpxml and qemu cmd line.
# virsh dumpxml avocado-vt-vm1 |grep "<cpu" -A20
  <cpu mode='custom' match='exact' check='full'>
    <model fallback='forbid'>Broadwell</model>
    <feature policy='require' name='vme'/>
    <feature policy='require' name='f16c'/>
    <feature policy='require' name='rdrand'/>
    <feature policy='require' name='hypervisor'/>
    <feature policy='require' name='arat'/>
    <feature policy='require' name='xsaveopt'/>
    <feature policy='require' name='abm'/>
  </cpu>

# ps -ef |grep avocado-vt-vm1
...
-cpu Broadwell,rtm=on,hle=on
...

Hi jiri
Could you please help to check whether the 'qemu cmd line' in step-3 is normal? 
Why does the qemu cmd line display like this way: vme=on, f16c=on..., and I do not know which cond correspond to 'rtm' and 'hle' in the qemu cmd line above.

I am a little confused, because CPU related XML and QemuCmdLine are always like the following:
For example:
https://bugzilla.redhat.com/show_bug.cgi?id=1558558#c6
# virsh dumpxml test1 |grep "<cpu" -A17
  <cpu mode='custom' match='exact' check='full'>
    <model fallback='forbid'>Skylake-Client</model>       *********
    <vendor>Intel</vendor>
    <feature policy='require' name='ss'/>
    <feature policy='require' name='hypervisor'/>
    <feature policy='require' name='tsc_adjust'/>
    <feature policy='require' name='pdpe1gb'/>
    <feature policy='disable' name='mpx'/>
    <feature policy='disable' name='xsavec'/>
    <feature policy='disable' name='xgetbv1'/>
  </cpu>

# ps -ef |grep test1
...
-cpu Skylake-Client,ss=on,hypervisor=on,tsc_adjust=on,pdpe1gb=on,mpx=off,xsavec=off,xgetbv1=off
...
Single 'feature' conf in dumpxml should correspond to the info in qemu cmd line.

Comment 12 Jiri Denemark 2019-03-28 08:40:23 UTC
(In reply to jiyan from comment #10)

This is correct. Libvirt in RHEL adds rtm=on and hle=on to any Haswell or
Broadwell CPU model (except for the noTSX variants) because QEMU was once
release with a downstream modification of these CPU models, which removed hle
and rtm. Thus to make sure Broadwell and Haswell always mean rtm/hle is on no
matter what QEMU version is used, we explicitly add these two features on the
command line.

The features which appear in the domain XML once the domain is started are
extra features which QEMU enabled on top of what we asked for. In most cases
this is caused by a difference between the CPU model definition in libvirt and
QEMU. They sometimes add new features the CPU models for some machine types.
So while we think the CPU model corresponds to some set of features, QEMU's
definition may use a bit different set of features for the same CPU model.
Once we start a domain, we ask QEMU for such features and add them to the XML
so that we can force the same feature set is used after save/restore or
migration.

In other words, CPU features found in the domain XML before the domain is
started will appear on the command line. Those which get into the XML once
the domain is running will only appear on the command line after you migrate
such domain to another host or perform save/restore (most easily via virsh
managedsave) operation.

BTW, this bug is mostly about host-model CPUs since custom mode CPU
definitions are not really affected by what we think the host CPU model is.

Comment 13 jiyan 2019-03-28 09:23:53 UTC
Adding the following test steps:

# virsh domstate avocado-vt-vm1-test
shut off

# virsh dumpxml avocado-vt-vm1-test|grep "<cpu" -A5
  <cpu mode='host-model' check='partial'>
    <model fallback='allow'/>
  </cpu>

# virsh start avocado-vt-vm1-test
Domain avocado-vt-vm1-test started

# virsh dumpxml avocado-vt-vm1-test|grep "<cpu" -A15
  <cpu mode='custom' match='exact' check='full'>
    <model fallback='forbid'>Broadwell</model>
    <vendor>Intel</vendor>
    <feature policy='require' name='vme'/>
    <feature policy='require' name='ss'/>
    <feature policy='require' name='f16c'/>
    <feature policy='require' name='rdrand'/>
    <feature policy='require' name='hypervisor'/>
    <feature policy='require' name='arat'/>
    <feature policy='require' name='tsc_adjust'/>
    <feature policy='require' name='xsaveopt'/>
    <feature policy='require' name='pdpe1gb'/>
    <feature policy='require' name='abm'/>
  </cpu>

# ps -ef |grep avocado-vt-vm1-test 
..
-cpu Broadwell,vme=on,ss=on,f16c=on,rdrand=on,hypervisor=on,arat=on,tsc_adjust=on,xsaveopt=on,pdpe1gb=on,abm=on,rtm=on,hle=on
...

Comment 16 jiyan 2019-04-10 01:52:47 UTC
Adding the following scenarios, all passed.
S1: Upgrade physical host from 7.5.z to 7.6.z, check the output of 'virsh capabilities' and 'virsh domcapabilities', dumpxml of active VM – PASS
S2: Upgrade physical host(src) from 7.5.z to 7.6.z, migrate VM from src(7.6.z) to dst (7.5.z), and then migrate VM back, check vm status, active dumpxml and console – PASS
S3: Upgrade physical host(src) from 7.5.z to 7.6.z, migrate VM from src(7.6.z) to dst (7.6.z), and then migrate VM back, check vm status, active dumpxml and console – PASS

Comment 18 errata-xmlrpc 2019-04-23 14:29:07 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0821