Bug 1633150

Summary: Cross migration from RHEL7.5 to RHEL7.6 fails with cpu flag stibp
Product: Red Hat Enterprise Linux 7 Reporter: Fangge Jin <fjin>
Component: qemu-kvm-rhevAssignee: Eduardo Habkost <ehabkost>
Status: CLOSED ERRATA QA Contact: jingzhao <jinzhao>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 7.6CC: chayang, dgilbert, ehabkost, hhuang, jdenemar, jherrman, jinzhao, jiyan, juzhang, kchamart, mrezanin, mtessun, salmy, toneata, virt-maint, xuzhang, zhguo
Target Milestone: rcKeywords: Regression, ZStream
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: qemu-kvm-rhev-2.12.0-19.el7 Doc Type: If docs needed, set a value
Doc Text:
Previously, migrating virtual machines (VMs) from a Red Hat Enterprise Linux 7.5 host with a single-thread indirect branch predictors (STIBP) flag set in some cases failed. This update ensures that the flag is consistently added to VMs with an AMD64 or Intel 64 virtual CPU (vCPU), which prevents the described problem from occurring.
Story Points: ---
Clone Of:
: 1638077 1639446 (view as bug list) Environment:
Last Closed: 2019-08-22 09:19:58 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1638077, 1651787    

Description Fangge Jin 2018-09-26 09:50:37 UTC
Description of problem:
On a RHEL7.5/7.4/7.3/7.2/6.10 host with cpu flag stibp, start a guest with cpu mode="host-model"(libvirt will add stibp=on in qemu command line ), then migrate guest to RHEL7.6 host with cpu flag stibp, migration fails:

# virsh migrate rhel7-min qemu+ssh://10.66.4.101/system --live --verbose --p2p
error: internal error: process exited while connecting to monitor: 2018-08-14T13:33:58.097151Z qemu-kvm: can't apply global IvyBridge-IBRS-x86_64-cpu.stibp=on: Property '.stibp' not found


Version-Release number of selected component (if applicable):
RHEL7.5:
libvirt-3.9.0-14.el7_5.7.x86_64
qemu-kvm-rhev-2.10.0-21.el7_5.7.x86_64

RHEL7.6:
libvirt-4.5.0-10.el7.x86_64
qemu-kvm-rhev-2.12.0-18.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1.On RHEL7.5 host, check cpu flag:
# lscpu |grep stibp
.....stibp..... intel_stibp

# virsh domcapabilities|grep stibp
      <feature policy='require' name='stibp'/>

2.Prepare a guest with cpu mode='host-model':
# virsh dumpxml rhel7-min
...
  <cpu mode='host-model' check='partial'>
    <model fallback='allow'/>
  </cpu>
...

3. Start guest, libvirt adds stibp=on in qemu command line:
...  -cpu IvyBridge-IBRS,ss=on,pcid=on,hypervisor=on,arat=on,tsc_adjust=on,stibp=on,ssbd=on,xsaveopt=on....

4. Migrate guest to RHEL7.6 host:
# virsh migrate rhel7-min qemu+ssh://10.66.4.101/system --live --verbose --p2p
error: internal error: process exited while connecting to monitor: 2018-08-14T13:33:58.097151Z qemu-kvm: can't apply global IvyBridge-IBRS-x86_64-cpu.stibp=on: Property '.stibp' not found

Actual results:
Migration fails

Expected results:
Migration succeeds

Additional info:

Comment 3 Jiri Denemark 2018-09-26 13:39:39 UTC
Is the feature actually enabled for the running guest or does QEMU complain
that it can't be enabled?

You should be able to check this with libvirt by starting a domain with
host-model and running "virsh dumpxml $DOMAIN". The XML should contain a CPU
model and features which were actually enabled by QEMU.

Comment 4 Fangge Jin 2018-09-26 14:04:12 UTC
(In reply to Jiri Denemark from comment #3)
> Is the feature actually enabled for the running guest or does QEMU complain
> that it can't be enabled?
> 
> You should be able to check this with libvirt by starting a domain with
> host-model and running "virsh dumpxml $DOMAIN". The XML should contain a CPU
> model and features which were actually enabled by QEMU.

After guest starts:
# virsh dumpxml rhel7-min
  <cpu mode='custom' match='exact' check='full'>
    <model fallback='forbid'>IvyBridge-IBRS</model>
    <vendor>Intel</vendor>
    <feature policy='require' name='ss'/>
    <feature policy='require' name='pcid'/>
    <feature policy='require' name='hypervisor'/>
    <feature policy='require' name='arat'/>
    <feature policy='require' name='tsc_adjust'/>
    <feature policy='require' name='stibp'/>
    <feature policy='require' name='ssbd'/>
    <feature policy='require' name='xsaveopt'/>

Comment 5 Jiri Denemark 2018-09-26 14:21:35 UTC
OK, so it confirms the feature was actually enabled by QEMU. And libvirt is
trying to make sure the feature does not disappear once the domain is
migrated.

Comment 13 jingzhao 2018-12-13 08:25:10 UTC
Verified it with qemu-kvm-rhev-2.12.0-20.el7.x86_64

Detailed info:
1. Boot guest with "cpu IvyBridge,ss=on,pcid=on,hypervisor=on,arat=on,tsc_adjust=on,stibp=on,ssbd=on,xsaveopt=on"
2. Migrate from RHEL.7.5 to RHEL.7.6
3. Migrate successfully

Changed to verified according to above test result

Comment 14 jiyan 2019-06-14 06:05:03 UTC
Hi Eduardo, I encountered similar problems in 7.6.z also, could you please have a look at it? thank you!

Description:
Fail to compute cpu baseline through "virsh capabilities" in RHEL-7.6.z because of "intel_pt" while it works well through "virsh domcapabilities" 

How reducible:
100%

Version:
RHEL-7.7 host
# cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 7.7 Beta (Maipo)

# rpm -qa libvirt qemu-kvm-rhev kernel
libvirt-4.5.0-22.el7.x86_64
kernel-3.10.0-1053.el7.x86_64
qemu-kvm-rhev-2.12.0-32.el7.x86_64

RHEL-7.6 host
# cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 7.6 (Maipo)

# rpm -qa libvirt qemu-kvm-rhev kernel
qemu-kvm-rhev-2.12.0-18.el7_6.7.x86_64
kernel-3.10.0-957.21.2.el7.x86_64
libvirt-4.5.0-10.el7_6.10.x86_64

Steps:
1. In rhel-7.7:
# virsh capabilities > cap1.xml
# virsh domcapabilities > dom1.xml

2. in rhel-7.6
# virsh domcapabilities > dom2.xml
# virsh capabilities > cap2.xm

3. In rhel-7.7 compute cpu baseline:
# cat cap1.xml cap2.xml >> capall.xml

# cat dom1.xml dom2.xml >> domall.xml

# virsh hypervisor-cpu-baseline capall.xml
<cpu mode='custom' match='exact'>
  <model fallback='forbid'>Skylake-Server-IBRS</model>
  <vendor>Intel</vendor>
  <feature policy='require' name='ds'/>
  <feature policy='require' name='acpi'/>
  <feature policy='require' name='ss'/>
  <feature policy='require' name='ht'/>
  <feature policy='require' name='tm'/>
  <feature policy='require' name='pbe'/>
  <feature policy='require' name='dtes64'/>
  <feature policy='require' name='monitor'/>
  <feature policy='require' name='ds_cpl'/>
  <feature policy='require' name='vmx'/>
  <feature policy='require' name='smx'/>
  <feature policy='require' name='est'/>
  <feature policy='require' name='tm2'/>
  <feature policy='require' name='xtpr'/>
  <feature policy='require' name='pdcm'/>
  <feature policy='require' name='dca'/>
  <feature policy='require' name='osxsave'/>
  <feature policy='require' name='tsc_adjust'/>
  <feature policy='require' name='clflushopt'/>
  <feature policy='require' name='pku'/>
  <feature policy='require' name='ospke'/>
  <feature policy='require' name='md-clear'/>
  <feature policy='require' name='stibp'/>
  <feature policy='require' name='ssbd'/>
  <feature policy='require' name='xsaves'/>
  <feature policy='require' name='invtsc'/>
</cpu>

# virsh hypervisor-cpu-baseline domall.xml
<cpu mode='custom' match='exact'>
  <model fallback='forbid'>Skylake-Server-IBRS</model>
  <vendor>Intel</vendor>
  <feature policy='require' name='ss'/>
  <feature policy='require' name='hypervisor'/>
  <feature policy='require' name='tsc_adjust'/>
  <feature policy='require' name='clflushopt'/>
  <feature policy='require' name='pku'/>
  <feature policy='require' name='md-clear'/>
  <feature policy='require' name='stibp'/>
  <feature policy='require' name='ssbd'/>
  <feature policy='require' name='invtsc'/>
</cpu>

4. In rhel-7.6 compute cpu baseline:
# cat cap1.xml cap2.xml >> capall.xml

# cat dom1.xml dom2.xml >> domall.xml

# virsh hypervisor-cpu-baseline capall.xml 
error: internal error: Unknown CPU feature intel-pt

# virsh hypervisor-cpu-baseline domall.xml 
<cpu mode='custom' match='exact'>
  <model fallback='forbid'>Skylake-Server-IBRS</model>
  <vendor>Intel</vendor>
  <feature policy='require' name='ss'/>
  <feature policy='require' name='hypervisor'/>
  <feature policy='require' name='tsc_adjust'/>
  <feature policy='require' name='clflushopt'/>
  <feature policy='require' name='pku'/>
  <feature policy='require' name='md-clear'/>
  <feature policy='require' name='stibp'/>
  <feature policy='require' name='ssbd'/>
  <feature policy='require' name='invtsc'/>
</cpu>

Actual result:
As step-4 shows

Expected result:
Since "virsh hypervisor-cpu-baseline" can accept the output of "virsh capabilities" and "virsh domcapabilities", then the result should keep same.

Additional info:
RHEL-7.6
# lscpu |grep intel
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb cat_l3 cdp_l3 intel_ppin intel_pt ssbd mba ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts pku ospke md_clear spec_ctrl intel_stibp flush_l1d

RHEL-7.7
# lscpu |grep intel
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb cat_l3 cdp_l3 intel_ppin intel_pt ssbd mba ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts pku ospke md_clear spec_ctrl intel_stibp flush_l1d

Comment 15 Jiri Denemark 2019-06-14 14:19:25 UTC
(In reply to jiyan from comment #14)
Hi, there are several issues in your test case.

The expectation is incorrect. The hypervisor-cpu-baseline command accepts any
host CPU XML (from capabilities or domain capabilities), but that doesn't mean
the result should be the same. Because the host CPU models are different. And
the documentation says the domain capabilities is the best source for the best
result.

If you're calling baseline on CPU models gathered from hosts which do not
contain the same version of packages, you logically need to run the baseline
API on the newest host since the older host(s) may not know some new features
reported only by new versions. These features will not be included in the
result because they are not supported on all hosts.

Comment 17 errata-xmlrpc 2019-08-22 09:19:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:2553