Bug 1539484

Summary: libvirt update some new cpu fea-tures to an old guest after upgrade from 7.3
Product: Red Hat Enterprise Linux 7 Reporter: Luyao Huang <lhuang>
Component: libvirtAssignee: Jiri Denemark <jdenemar>
Status: CLOSED WONTFIX QA Contact: jiyan <jiyan>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.5CC: dyuan, jdenemar, lizhu, xuzhang, yalzhang, zpeng
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-11-19 12:03:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Luyao Huang 2018-01-29 02:14:45 UTC
Description of problem:
libvirt update some new cpu fea-tures to an old guest after upgrade from 7.3
which cause guest cannot migrate back to the old libvirt

Version-Release number of selected component (if applicable):
libvirt-3.9.0-9.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1. the guest xml before upgrade to 7.5:

  <cpu mode='host-model' match='exact'>
    <model fallback='forbid'>Opteron_G3</model>
    <vendor>AMD</vendor>
    <feature policy='require' name='vme'/>
    <feature policy='require' name='ht'/>
    <feature policy='require' name='pclmuldq'/>
    <feature policy='require' name='ssse3'/>
    <feature policy='require' name='fma'/>
    <feature policy='require' name='sse4.1'/>
    <feature policy='require' name='sse4.2'/>
    <feature policy='require' name='movbe'/>
    <feature policy='require' name='aes'/>
    <feature policy='require' name='xsave'/>
    <feature policy='require' name='osxsave'/>
    <feature policy='require' name='avx'/>
    <feature policy='require' name='f16c'/>
    <feature policy='require' name='rdrand'/>
    <feature policy='require' name='arat'/>
    <feature policy='require' name='fsgsbase'/>
    <feature policy='require' name='bmi1'/>
    <feature policy='require' name='avx2'/>
    <feature policy='require' name='smep'/>
    <feature policy='require' name='bmi2'/>
    <feature policy='require' name='rdseed'/>
    <feature policy='require' name='adx'/>
    <feature policy='require' name='smap'/>
    <feature policy='require' name='clflushopt'/>
    <feature policy='require' name='xsaveopt'/>
    <feature policy='require' name='xsavec'/>
    <feature policy='require' name='xgetbv1'/>
    <feature policy='require' name='mmxext'/>
    <feature policy='require' name='fxsr_opt'/>
    <feature policy='require' name='pdpe1gb'/>
    <feature policy='require' name='cmp_legacy'/>
    <feature policy='require' name='extapic'/>
    <feature policy='require' name='cr8legacy'/>
    <feature policy='require' name='3dnowprefetch'/>
    <feature policy='require' name='osvw'/>
    <feature policy='require' name='skinit'/>
    <feature policy='require' name='wdt'/>
    <feature policy='require' name='tce'/>
    <feature policy='require' name='topoext'/>
    <feature policy='require' name='perfctr_core'/>
    <feature policy='require' name='perfctr_nb'/>
    <feature policy='require' name='ibpb'/>

2. guest xml after update to 7.5:

# virsh dumpxml vm1
...
  <cpu mode='custom' match='exact' check='full'>
    <model fallback='allow'>Opteron_G5</model>
    <vendor>AMD</vendor>
    <feature policy='require' name='vme'/>
    <feature policy='require' name='x2apic'/>
    <feature policy='require' name='movbe'/>
    <feature policy='require' name='rdrand'/>
    <feature policy='require' name='hypervisor'/>
    <feature policy='require' name='arat'/>
    <feature policy='require' name='fsgsbase'/>
    <feature policy='require' name='bmi1'/>
    <feature policy='require' name='avx2'/>
    <feature policy='require' name='smep'/>
    <feature policy='require' name='bmi2'/>
    <feature policy='require' name='rdseed'/>
    <feature policy='require' name='adx'/>
    <feature policy='require' name='smap'/>
    <feature policy='require' name='clflushopt'/>
    <feature policy='require' name='xsaveopt'/>
    <feature policy='require' name='xsavec'/>
    <feature policy='require' name='xgetbv1'/>
    <feature policy='require' name='mmxext'/>
    <feature policy='require' name='fxsr_opt'/>
    <feature policy='require' name='cmp_legacy'/>
    <feature policy='require' name='cr8legacy'/>
    <feature policy='require' name='osvw'/>
    <feature policy='require' name='ibpb'/>
    <feature policy='disable' name='rdtscp'/>
    <feature policy='disable' name='svm'/>
    <feature policy='disable' name='xop'/>
    <feature policy='disable' name='fma4'/>
    <feature policy='disable' name='tbm'/>
    <feature policy='disable' name='ht'/>
    <feature policy='disable' name='osxsave'/>
    <feature policy='disable' name='extapic'/>
    <feature policy='disable' name='skinit'/>
    <feature policy='disable' name='wdt'/>
    <feature policy='disable' name='tce'/>
    <feature policy='disable' name='topoext'/>
    <feature policy='disable' name='perfctr_core'/>
    <feature policy='disable' name='perfctr_nb'/>
    <feature policy='disable' name='monitor'/>
    <feature policy='disable' name='sha-ni'/>
...


# virsh dumpxml vm1 --migratable
...
  <cpu mode='custom' match='exact' check='partial'>
    <model fallback='allow'>EPYC</model>
    <vendor>AMD</vendor>
    <feature policy='require' name='ht'/>
    <feature policy='require' name='osxsave'/>
    <feature policy='require' name='cmp_legacy'/>
    <feature policy='require' name='extapic'/>
    <feature policy='require' name='skinit'/>
    <feature policy='require' name='wdt'/>
    <feature policy='require' name='tce'/>
    <feature policy='require' name='topoext'/>
    <feature policy='require' name='perfctr_core'/>
    <feature policy='require' name='perfctr_nb'/>
    <feature policy='require' name='ibpb'/>
...


# virsh migrate --live vm1 qemu+ssh://target/system
error: internal error: Unknown CPU feature sha-ni


Actual results:

Libvirt add sha-ni cpu feature in an old guest and this broke the migration

Expected results:

Can migrate guest back to 7.3 host

Additional info:

from bug 1521202 comment 7:

We could possibly try to avoid adding CPU features which were not available in the old libvirt into the guest CPU definition when replacing host-model for a running domain. However, we need to do it in a very smart way to avoid breaking migration to new libvirt.

Comment 1 Jiri Denemark 2018-09-06 11:07:25 UTC
- the host-model code detects the host CPU as EPYC
- we translate it to Opteron_G5 because the running QEMU has no idea
  what EPYC is
- when we're updating the CPU model according to what CPUID bits QEMU
  enabled, we disable all the new features which QEMU did not know
  about and thus it could not enable them (see commit 83e081b8ab3)
- we should just remove such features rather than disabling them since
  QEMU will not enable them anywhere

Comment 2 Jiri Denemark 2019-11-19 12:03:14 UTC
This is a bug which happens only when upgrading a host from RHEL 7.3 or older
with running domains to a newer RHEL release. Fixing it in 7.8 does not make a
lot of sense since users would need to live upgrade from 7.3 to 7.8 with
domains running on the host to benefit from the fix. There's no issue if no
domain is running during the upgrade or when upgrading from a RHEL 7.4 or
newer.