Bug 1687515
Summary: | Enhance detection of host CPU model to avoid guesses based on fea.ture list length [rhel-7.6.z] | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | RAD team bot copy to z-stream <autobot-eus-copy> |
Component: | libvirt | Assignee: | Jiri Denemark <jdenemar> |
Status: | CLOSED ERRATA | QA Contact: | jiyan <jiyan> |
Severity: | urgent | Docs Contact: | |
Priority: | urgent | ||
Version: | 7.6 | CC: | dyuan, fjin, gveitmic, hfukumot, jdenemar, lhuang, lmen, mvanderw, xuzhang, yalzhang |
Target Milestone: | rc | Keywords: | ZStream |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | libvirt-4.5.0-10.el7_6.7 | Doc Type: | Bug Fix |
Doc Text: |
Cause: Some CPUs were incorrectly detected as a different CPU model, e.g., some Broadwell CPUs were detected as Skylake-Client CPU model.
Consequence: VMs started with host-model CPUs on a Broadwell host could be started with Skylake-Client CPU model, which could cause noticeable slowdown in some workloads.
Fix: The host CPU model detection algorithm was enhanced to cover more CPU signatures (family and model numbers) found in physical CPUs. The algorithm uses the list of real world CPU signatures to find the appropriate CPU model regardless on specific features which a particular CPU supports.
Result: All Broadwell CPUs should now be correctly detected as some of the variants of Broadwell CPU model eliminating the slowdown for VMs using host-model CPU.
|
Story Points: | --- |
Clone Of: | 1558558 | Environment: | |
Last Closed: | 2019-04-23 14:29:07 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1558558 | ||
Bug Blocks: |
Description
RAD team bot copy to z-stream
2019-03-11 16:18:44 UTC
Version: libvirt-4.5.0-10.el7_6.7.x86_64 qemu-kvm-rhev-2.12.0-18.el7_6.3.x86_64 kernel-3.10.0-957.el7.x86_64 Steps: 1. Check the output of 'virsh capabilities' and 'virsh domcapabilities' of 'Broadwell' physical machine. # lscpu ... Model name: Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz # virsh capabilities ... <host> <uuid>4f11c612-e27d-11e7-9a7d-0894ef59df54</uuid> <cpu> <arch>x86_64</arch> <model>Broadwell</model> <vendor>Intel</vendor> <microcode version='184549410'/> <topology sockets='1' cores='10' threads='2'/> <feature name='vme'/> ... # virsh domcapabilities ... <cpu> <mode name='host-passthrough' supported='yes'/> <mode name='host-model' supported='yes'> <model fallback='forbid'>Broadwell</model> <vendor>Intel</vendor> <feature policy='require' name='vme'/> <feature policy='require' name='ss'/> <feature policy='require' name='f16c'/> <feature policy='require' name='rdrand'/> <feature policy='require' name='hypervisor'/> <feature policy='require' name='arat'/> <feature policy='require' name='tsc_adjust'/> <feature policy='require' name='xsaveopt'/> <feature policy='require' name='pdpe1gb'/> <feature policy='require' name='abm'/> <feature policy='require' name='invtsc'/> </mode> ... 2. Start VM with 'custom' cpu conf, start VM # virsh domstate avocado-vt-vm1 shut off # virsh dumpxml avocado-vt-vm1 |grep "<cpu" -A3 <cpu mode='custom' match='exact' check='partial'> <model fallback='allow'>Broadwell</model> </cpu> # virsh start avocado-vt-vm1 Domain avocado-vt-vm1 started 3. After Step-2, check the CPU conf in active dumpxml and qemu cmd line. # virsh dumpxml avocado-vt-vm1 |grep "<cpu" -A20 <cpu mode='custom' match='exact' check='full'> <model fallback='forbid'>Broadwell</model> <feature policy='require' name='vme'/> <feature policy='require' name='f16c'/> <feature policy='require' name='rdrand'/> <feature policy='require' name='hypervisor'/> <feature policy='require' name='arat'/> <feature policy='require' name='xsaveopt'/> <feature policy='require' name='abm'/> </cpu> # ps -ef |grep avocado-vt-vm1 ... -cpu Broadwell,rtm=on,hle=on ... Hi jiri Could you please help to check whether the 'qemu cmd line' in step-3 is normal? Why does the qemu cmd line display like this way: vme=on, f16c=on..., and I do not know which cond correspond to 'rtm' and 'hle' in the qemu cmd line above. I am a little confused, because CPU related XML and QemuCmdLine are always like the following: For example: https://bugzilla.redhat.com/show_bug.cgi?id=1558558#c6 # virsh dumpxml test1 |grep "<cpu" -A17 <cpu mode='custom' match='exact' check='full'> <model fallback='forbid'>Skylake-Client</model> ********* <vendor>Intel</vendor> <feature policy='require' name='ss'/> <feature policy='require' name='hypervisor'/> <feature policy='require' name='tsc_adjust'/> <feature policy='require' name='pdpe1gb'/> <feature policy='disable' name='mpx'/> <feature policy='disable' name='xsavec'/> <feature policy='disable' name='xgetbv1'/> </cpu> # ps -ef |grep test1 ... -cpu Skylake-Client,ss=on,hypervisor=on,tsc_adjust=on,pdpe1gb=on,mpx=off,xsavec=off,xgetbv1=off ... Single 'feature' conf in dumpxml should correspond to the info in qemu cmd line. (In reply to jiyan from comment #10) This is correct. Libvirt in RHEL adds rtm=on and hle=on to any Haswell or Broadwell CPU model (except for the noTSX variants) because QEMU was once release with a downstream modification of these CPU models, which removed hle and rtm. Thus to make sure Broadwell and Haswell always mean rtm/hle is on no matter what QEMU version is used, we explicitly add these two features on the command line. The features which appear in the domain XML once the domain is started are extra features which QEMU enabled on top of what we asked for. In most cases this is caused by a difference between the CPU model definition in libvirt and QEMU. They sometimes add new features the CPU models for some machine types. So while we think the CPU model corresponds to some set of features, QEMU's definition may use a bit different set of features for the same CPU model. Once we start a domain, we ask QEMU for such features and add them to the XML so that we can force the same feature set is used after save/restore or migration. In other words, CPU features found in the domain XML before the domain is started will appear on the command line. Those which get into the XML once the domain is running will only appear on the command line after you migrate such domain to another host or perform save/restore (most easily via virsh managedsave) operation. BTW, this bug is mostly about host-model CPUs since custom mode CPU definitions are not really affected by what we think the host CPU model is. Adding the following test steps: # virsh domstate avocado-vt-vm1-test shut off # virsh dumpxml avocado-vt-vm1-test|grep "<cpu" -A5 <cpu mode='host-model' check='partial'> <model fallback='allow'/> </cpu> # virsh start avocado-vt-vm1-test Domain avocado-vt-vm1-test started # virsh dumpxml avocado-vt-vm1-test|grep "<cpu" -A15 <cpu mode='custom' match='exact' check='full'> <model fallback='forbid'>Broadwell</model> <vendor>Intel</vendor> <feature policy='require' name='vme'/> <feature policy='require' name='ss'/> <feature policy='require' name='f16c'/> <feature policy='require' name='rdrand'/> <feature policy='require' name='hypervisor'/> <feature policy='require' name='arat'/> <feature policy='require' name='tsc_adjust'/> <feature policy='require' name='xsaveopt'/> <feature policy='require' name='pdpe1gb'/> <feature policy='require' name='abm'/> </cpu> # ps -ef |grep avocado-vt-vm1-test .. -cpu Broadwell,vme=on,ss=on,f16c=on,rdrand=on,hypervisor=on,arat=on,tsc_adjust=on,xsaveopt=on,pdpe1gb=on,abm=on,rtm=on,hle=on ... Adding the following scenarios, all passed. S1: Upgrade physical host from 7.5.z to 7.6.z, check the output of 'virsh capabilities' and 'virsh domcapabilities', dumpxml of active VM – PASS S2: Upgrade physical host(src) from 7.5.z to 7.6.z, migrate VM from src(7.6.z) to dst (7.5.z), and then migrate VM back, check vm status, active dumpxml and console – PASS S3: Upgrade physical host(src) from 7.5.z to 7.6.z, migrate VM from src(7.6.z) to dst (7.6.z), and then migrate VM back, check vm status, active dumpxml and console – PASS Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0821 |