Bug 1485022

Summary: Guest CPU in an offline snapshot changes from host-model to custom
Product: Red Hat Enterprise Linux 7 Reporter: Strahil Nikolov <hunter86_bg>
Component: libvirtAssignee: Jiri Denemark <jdenemar>
Status: CLOSED ERRATA QA Contact: Luyao Huang <lhuang>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 7.4CC: bugzilla, dhill, dyuan, hhan, hunter86_bg, jsuchane, juzhou, kchamart, kuwei, lmen, mxie, rbalakri, tzheng, xiaodwan, xuzhang, yalzhang
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: libvirt-3.8.0-1.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-04-10 10:55:31 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1199452    
Attachments:
Description Flags
Virt-Manager screenshot none

Description Strahil Nikolov 2017-08-24 21:15:26 UTC
Created attachment 1317911 [details]
Virt-Manager screenshot

Description of problem:
After snapshot cannot start a VM as automatically is added 'invtsc' to the CPU section of the configuration

Version-Release number of selected component (if applicable):
libvirt-3.2.0-14.el7_4.2.x86_64
virt-manager-1.4.1-7.el7.noarch
virt-manager-common-1.4.1-7.el7.noarch

How reproducible:
Always

Steps to Reproduce:
1.Set CPU to "Copy host CPU configuration" (FX-8350 in my case)
2.Run and shutdown the VM
3.Take a snapshot
4.Restore from snapshot
5.Start the VM


Actual results:
Cannot start the VM due to:
Error starting domain: unsupported configuration: host doesn't support invariant TSC

Traceback (most recent call last):
  File "/usr/share/virt-manager/virtManager/asyncjob.py", line 88, in cb_wrapper
    callback(asyncjob, *args, **kwargs)
  File "/usr/share/virt-manager/virtManager/asyncjob.py", line 124, in tmpcb
    callback(*args, **kwargs)
  File "/usr/share/virt-manager/virtManager/libvirtobject.py", line 83, in newfn
    ret = fn(self, *args, **kwargs)
  File "/usr/share/virt-manager/virtManager/domain.py", line 1489, in startup
    self._backend.create()
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1039, in create
    if ret == -1: raise libvirtError ('virDomainCreate() failed', dom=self)
libvirtError: unsupported configuration: host doesn't support invariant TSC



Expected results:
VM to start with the configuration before the snapshot.
The following should not be added to VM's xml:
<feature policy='require' name='invtsc'/>

Additional info:
Workaround - select another CPU type , save and restore to "Copy host CPU configuration"

Comment 4 Pavel Hrdina 2017-09-11 13:17:40 UTC
Moving to libvirt since this isn't a virt-manager issue.

Comment 5 Strahil Nikolov 2017-09-11 13:50:14 UTC
Update:

Tried again with cpu model "Opteron_G5" and the issue happened again (VM is newly built).
How to reproduce:
1. Shutdown gracefully the VM
2. Create snapshot
3. Run selected snapshot

Version of libvirt and accompanying software:

fence-virtd-libvirt-0.3.2-12.el7.x86_64
libvirt-3.2.0-14.el7_4.3.x86_64
libvirt-client-3.2.0-14.el7_4.3.x86_64
libvirt-daemon-3.2.0-14.el7_4.3.x86_64
libvirt-daemon-config-network-3.2.0-14.el7_4.3.x86_64
libvirt-daemon-config-nwfilter-3.2.0-14.el7_4.3.x86_64
libvirt-daemon-driver-interface-3.2.0-14.el7_4.3.x86_64
libvirt-daemon-driver-lxc-3.2.0-14.el7_4.3.x86_64
libvirt-daemon-driver-network-3.2.0-14.el7_4.3.x86_64
libvirt-daemon-driver-nodedev-3.2.0-14.el7_4.3.x86_64
libvirt-daemon-driver-nwfilter-3.2.0-14.el7_4.3.x86_64
libvirt-daemon-driver-qemu-3.2.0-14.el7_4.3.x86_64
libvirt-daemon-driver-secret-3.2.0-14.el7_4.3.x86_64
libvirt-daemon-driver-storage-3.2.0-14.el7_4.3.x86_64
libvirt-daemon-driver-storage-core-3.2.0-14.el7_4.3.x86_64
libvirt-daemon-driver-storage-disk-3.2.0-14.el7_4.3.x86_64
libvirt-daemon-driver-storage-gluster-3.2.0-14.el7_4.3.x86_64
libvirt-daemon-driver-storage-iscsi-3.2.0-14.el7_4.3.x86_64
libvirt-daemon-driver-storage-logical-3.2.0-14.el7_4.3.x86_64
libvirt-daemon-driver-storage-mpath-3.2.0-14.el7_4.3.x86_64
libvirt-daemon-driver-storage-rbd-3.2.0-14.el7_4.3.x86_64
libvirt-daemon-driver-storage-scsi-3.2.0-14.el7_4.3.x86_64
libvirt-daemon-kvm-3.2.0-14.el7_4.3.x86_64
libvirt-gconfig-1.0.0-1.el7.x86_64
libvirt-glib-1.0.0-1.el7.x86_64
libvirt-gobject-1.0.0-1.el7.x86_64
libvirt-libs-3.2.0-14.el7_4.3.x86_64
libvirt-python-3.2.0-3.el7.x86_64

Workaround - select another CPU model,apply and then return to the original one.

Comment 6 Kashyap Chamarthy 2017-09-12 13:01:48 UTC
A simple reproducer where the guest CPU model is changed from 'host-model' to 'custom', after an offline, internal snapshot is created:


Check the current CPU mode on the offline guest:

    $ virsh dumpxml cvm1 | grep host-model
      <cpu mode='host-model' check='partial'>

Create an offline, internal snapshot:

    $ virsh snapshot-create-as cvm1 offline-int2
    Domain snapshot offline-int2 created

Check the snapshot metadata for what CPU mode it has (it now has
'custom'):

    $ virsh snapshot-dumpxml cvm1 offline-int2  | grep custom
      <cpu mode='custom' match='exact' check='partial'>

Start the guest:

    $ virsh start cvm1
    Domain cvm1 started

Now check for the 'host-model' CPU mode, it is no longer present:

    $ virsh dumpxml cvm1 | grep host-model
    $ echo $?
    1

Instead, you see the 'custom' CPU mode:

    $ virsh dumpxml cvm1 | grep custom -A14
      <cpu mode='custom' match='exact' check='full'>
        <model fallback='forbid'>Haswell-noTSX</model>
        <vendor>Intel</vendor>
        <feature policy='require' name='vme'/>
        <feature policy='require' name='ss'/>
        <feature policy='require' name='vmx'/>
        <feature policy='require' name='f16c'/>
        <feature policy='require' name='rdrand'/>
        <feature policy='require' name='hypervisor'/>
        <feature policy='require' name='arat'/>
        <feature policy='require' name='tsc_adjust'/>
        <feature policy='require' name='xsaveopt'/>
        <feature policy='require' name='pdpe1gb'/>
        <feature policy='require' name='abm'/>
      </cpu>

Comment 7 Jiri Denemark 2017-09-15 08:32:43 UTC
(In reply to Kashyap Chamarthy from comment #6)
> A simple reproducer where the guest CPU model is changed from 'host-model'
> to 'custom', after an offline, internal snapshot is created:
> 
> 
> Check the current CPU mode on the offline guest:
> 
>     $ virsh dumpxml cvm1 | grep host-model
>       <cpu mode='host-model' check='partial'>
> 
> Create an offline, internal snapshot:
> 
>     $ virsh snapshot-create-as cvm1 offline-int2
>     Domain snapshot offline-int2 created
> 
> Check the snapshot metadata for what CPU mode it has (it now has
> 'custom'):
> 
>     $ virsh snapshot-dumpxml cvm1 offline-int2  | grep custom
>       <cpu mode='custom' match='exact' check='partial'>

This is enough to reproduce the issue.

> Start the guest:
> 
>     $ virsh start cvm1
>     Domain cvm1 started
> 
> Now check for the 'host-model' CPU mode, it is no longer present:
> 
>     $ virsh dumpxml cvm1 | grep host-model
>     $ echo $?
>     1

This is useless. A running domain will never have a host-model in its live XML.

Comment 8 Jiri Denemark 2017-09-18 11:41:15 UTC
This is similar to bug 1473516, but as there are two issues which need to be fixed, I'll keep both bugs open.

The first issue is the addition of unsupported or unknown CPU features when updating inactive guest CPU definition (virsh dumpxml --inactive --update-cpu). This will be covered by bug 1473516.

The second issue causes libvirt to update guest CPU when creating an offline snapshot, which is not expected. The domain is not running and thus we don't need to keep exact ABI of the guest CPU. Thus CPUs in offline snapshots should not be updated at all. And this issue is cover by this BZ.

Comment 9 Jiri Denemark 2017-09-18 12:36:05 UTC
Comment 6 shows an easy reproducer. To sum up:

1. define a domain with <cpu mode='host-model'/>
2. while the domain is NOT running, take its snapshot virsh snapshot-create-as $DOM snap
3. virsh snapshot-dumpxml $DOM snap

Once this bug is fixed, the XML returned in snap 3 should contain a CPU with mode='host-model'.

Comment 10 Jiri Denemark 2017-09-18 12:37:36 UTC
Oops, there is a wrong bug number in comment 8. It should have referenced bug 1481309

Comment 11 Jiri Denemark 2017-09-18 13:37:31 UTC
Patches sent upstream for review: https://www.redhat.com/archives/libvir-list/2017-September/msg00517.html

Comment 12 Jiri Denemark 2017-09-22 11:21:12 UTC
Both issues mentioned in comment 8 are fixed upstream by

commit 7e874326a3eca1233017ab91774d845b99869af1
Refs: v3.7.0-150-g7e874326a3
Author:     Jiri Denemark <jdenemar>
AuthorDate: Fri Jun 30 17:05:22 2017 +0200
Commit:     Jiri Denemark <jdenemar>
CommitDate: Thu Sep 21 15:27:39 2017 +0200

    qemu: Use correct host model for updating guest cpu

    When a user requested a domain XML description with
    VIR_DOMAIN_XML_UPDATE_CPU flag, libvirt would use the host CPU
    definition from host capabilities rather than the one which will
    actually be used once the domain is started.

    https://bugzilla.redhat.com/show_bug.cgi?id=1481309

    Signed-off-by: Jiri Denemark <jdenemar>

commit 06f75ff2cb292e2658b4f2f6949c700550006272
Refs: v3.7.0-151-g06f75ff2cb
Author:     Jiri Denemark <jdenemar>
AuthorDate: Fri Jun 30 16:55:20 2017 +0200
Commit:     Jiri Denemark <jdenemar>
CommitDate: Thu Sep 21 15:27:39 2017 +0200

    qemu: Don't update CPU when formatting live def

    Since commit v2.2.0-199-g7ce711a30e libvirt stores an updated guest CPU
    in domain's live definition and there's no need to update it every time
    we want to format the definition. The commit itself tried to address
    this in qemuDomainFormatXML, but forgot to fix qemuDomainDefFormatLive.
    Not to mention that masking a previously set flag is only acceptable if
    the flag was set by a public API user. Internally, libvirt should have
    never set the flag in the first place.

    https://bugzilla.redhat.com/show_bug.cgi?id=1485022

    Signed-off-by: Jiri Denemark <jdenemar>

Comment 16 Pavel Hrdina 2017-10-12 12:37:40 UTC
*** Bug 1501341 has been marked as a duplicate of this bug. ***

Comment 17 Luyao Huang 2017-12-20 08:59:56 UTC
Verify this bug with libvirt-3.9.0-6.el7.x86_64:

1. prepare a guest with host-model cpu mode:

# virsh dumpxml r7-mig

  <cpu mode='host-model' check='partial'>
    <model fallback='allow'/>
    <numa>
      <cell id='0' cpus='0-2' memory='524288' unit='KiB'/>
      <cell id='1' cpus='3-5' memory='524288' unit='KiB'/>
    </numa>
  </cpu>

2. create snapshot:

# virsh snapshot-create-as r7-mig s1
Domain snapshot s1 created

# virsh snapshot-list r7-mig
 Name                 Creation Time             State
------------------------------------------------------------
 s1                   2017-12-20 03:42:58 -0500 shutoff

3. check the snapshot xml:

# virsh snapshot-dumpxml r7-mig s1 | grep -A5 "cpu mode"
    <cpu mode='host-model' check='partial'>
      <model fallback='allow'/>
      <numa>
        <cell id='0' cpus='0-2' memory='524288' unit='KiB'/>
        <cell id='1' cpus='3-5' memory='524288' unit='KiB'/>
      </numa>

4. revert snapshot and recheck the guest xml:

# virsh snapshot-revert r7-mig --current

# virsh dumpxml r7-mig |grep -A5 "cpu mode"
  <cpu mode='host-model' check='partial'>
    <model fallback='allow'/>
    <numa>
      <cell id='0' cpus='0-2' memory='524288' unit='KiB'/>
      <cell id='1' cpus='3-5' memory='524288' unit='KiB'/>
    </numa>

5. start guest and recheck xml:

# virsh start r7-mig
Domain r7-mig started


# virsh dumpxml r7-mig |grep -A20 "cpu mode"
  <cpu mode='custom' match='exact' check='full'>
    <model fallback='forbid'>Opteron_G5</model>
    <vendor>AMD</vendor>
    <feature policy='require' name='vme'/>
    <feature policy='require' name='x2apic'/>
    <feature policy='require' name='tsc-deadline'/>
    <feature policy='require' name='hypervisor'/>
    <feature policy='require' name='arat'/>
    <feature policy='require' name='tsc_adjust'/>
    <feature policy='require' name='bmi1'/>
    <feature policy='require' name='mmxext'/>
    <feature policy='require' name='fxsr_opt'/>
    <feature policy='require' name='cmp_legacy'/>
    <feature policy='require' name='cr8legacy'/>
    <feature policy='require' name='osvw'/>
    <feature policy='disable' name='svm'/>
    <feature policy='disable' name='rdtscp'/>
    <numa>
      <cell id='0' cpus='0-2' memory='524288' unit='KiB'/>
      <cell id='1' cpus='3-5' memory='524288' unit='KiB'/>
    </numa>

Comment 21 errata-xmlrpc 2018-04-10 10:55:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:0704