Bug 1481309

Summary: VM does not start after snapshot revert with error "Property '.cmt' not found"
Product: Red Hat Enterprise Linux 7 Reporter: Gurenko Alex <agurenko>
Component: libvirtAssignee: Jiri Denemark <jdenemar>
Status: CLOSED ERRATA QA Contact: Han Han <hhan>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.4CC: agurenko, dhill, dyuan, lmen, rbalakri, xuzhang
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: libvirt-3.8.0-1.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-04-10 10:55:31 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1199452    
Attachments:
Description Flags
undercloud-0 xml dump none

Description Gurenko Alex 2017-08-14 14:37:59 UTC
Description of problem: While using RHEL 7.4 as a hypervisor and making a snapshot, these snapshots cannot be reverted with a simple virsh snapshot revert, getting following error:

error: Failed to start domain undercloud-0
error: internal error: qemu unexpectedly closed the monitor: 2017-08-13T13:56:49.956514Z qemu-kvm: -chardev pty,id=charserial0: char device redirected to /dev/pts/1 (label charserial0)
2017-08-13T13:56:49.960480Z qemu-kvm: can't apply global Haswell-noTSX-x86_64-cpu.cmt=on: Property '.cmt' not found


Version-Release number of selected component (if applicable): RHEL 7.4

kernel: 3.10.0-693.el7.x86_64
libvirt: libvirt-3.2.0-14.el7_4.2.x86_64

How reproducible: 100%


Steps to Reproduce:
1. Deploy a node with InfraRed
2. Make a snapshot of this node
3. Revert snapshot and start a VM

Actual results: 

error: Failed to start domain

Expected results:

domain started successfully 

Additional info:

Comment 2 Han Han 2017-08-15 06:24:12 UTC
(In reply to Gurenko Alex from comment #0)
> Description of problem: While using RHEL 7.4 as a hypervisor and making a
> snapshot, these snapshots cannot be reverted with a simple virsh snapshot
> revert, getting following error:
> 
> error: Failed to start domain undercloud-0
> error: internal error: qemu unexpectedly closed the monitor:
> 2017-08-13T13:56:49.956514Z qemu-kvm: -chardev pty,id=charserial0: char
> device redirected to /dev/pts/1 (label charserial0)
> 2017-08-13T13:56:49.960480Z qemu-kvm: can't apply global
> Haswell-noTSX-x86_64-cpu.cmt=on: Property '.cmt' not found
> 
> 
> Version-Release number of selected component (if applicable): RHEL 7.4
> 
> kernel: 3.10.0-693.el7.x86_64
> libvirt: libvirt-3.2.0-14.el7_4.2.x86_64
> 
> How reproducible: 100%
> 
> 
> Steps to Reproduce:
> 1. Deploy a node with InfraRed
> 2. Make a snapshot of this node
> 3. Revert snapshot and start a VM
> 
> Actual results: 
> 
> error: Failed to start domain
> 
> Expected results:
> 
> domain started successfully 
> 
> Additional info:
Hi Gurenko, I need some details to reproduce the bug:
1. Is this host deployed with openstack InfraRed?
2. What is the qemu-kvm or qemu-kvm-rhev version?
3. What is your host CPU?
4. Please provide the VM's xml by `virsh dumpxml <VM_NAME>`

Thanks

Comment 3 Gurenko Alex 2017-08-15 07:52:03 UTC
Created attachment 1313497 [details]
undercloud-0 xml dump

(In reply to Han Han from comment #2)
> Hi Gurenko, I need some details to reproduce the bug:
> 1. Is this host deployed with openstack InfraRed?
> 2. What is the qemu-kvm or qemu-kvm-rhev version?
> 3. What is your host CPU?
> 4. Please provide the VM's xml by `virsh dumpxml <VM_NAME>`
> 
> Thanks

1. The VM hosts are created by virsh plugin with InfraRed, hypervisor is provisioned with Foreman
2. qemu-kvm-common-rhev-2.9.0-16.el7_4.3.x86_64
qemu-kvm-rhev-2.9.0-16.el7_4.3.x86_64
3. Intel(R) Xeon(R) CPU E5-2630 v3
4. I've attached xml to the bug

Comment 4 Jiri Denemark 2017-09-15 08:42:47 UTC
Could you please provide two XMLs: one generated just after creating the domain (between step 1 and 2) and another one generated after reverting the snapshot (after step 3)?

Comment 5 Jiri Denemark 2017-09-18 12:40:34 UTC
I managed to reproduce this issue locally (on a cmt capable hardware).

This is similar to bug 1485022, but as there are two issues which need to be fixed, I'll keep both bugs open.

The first issue is the addition of unsupported or unknown CPU features when updating inactive guest CPU definition (virsh dumpxml --inactive --update-cpu). This will be covered by this bz.

The second issue causes libvirt to update guest CPU when creating an offline snapshot, which is not expected. The domain is not running and thus we don't need to keep exact ABI of the guest CPU. Thus CPUs in offline snapshots should not be updated at all. And this issue is cover by bug 1485022.

Comment 6 Jiri Denemark 2017-09-18 12:47:58 UTC
Once bug 1485022 is fixed it will no longer be possible to verify this bug using snapshots. You can you the following easy steps to reproduce and verify this bug:

1. on a host where libvirt lists "cmt" feature in host CPU capabilities (such as a host with Intel E5-2630, E5-2650, or E7-8890 CPUs) define a domain with <cpu mode='host-model'/>
2. virsh dumpxml --inactive --update-cpu

The guest CPU in the output should list fewer features and mainly it should not contain "cmt" feature.

Comment 7 Jiri Denemark 2017-09-18 13:37:21 UTC
Patches sent upstream for review: https://www.redhat.com/archives/libvir-list/2017-September/msg00517.html

Comment 8 Jiri Denemark 2017-09-22 11:21:09 UTC
Both issues mentioned in comment 5 are fixed upstream by

commit 7e874326a3eca1233017ab91774d845b99869af1
Refs: v3.7.0-150-g7e874326a3
Author:     Jiri Denemark <jdenemar>
AuthorDate: Fri Jun 30 17:05:22 2017 +0200
Commit:     Jiri Denemark <jdenemar>
CommitDate: Thu Sep 21 15:27:39 2017 +0200

    qemu: Use correct host model for updating guest cpu

    When a user requested a domain XML description with
    VIR_DOMAIN_XML_UPDATE_CPU flag, libvirt would use the host CPU
    definition from host capabilities rather than the one which will
    actually be used once the domain is started.

    https://bugzilla.redhat.com/show_bug.cgi?id=1481309

    Signed-off-by: Jiri Denemark <jdenemar>

commit 06f75ff2cb292e2658b4f2f6949c700550006272
Refs: v3.7.0-151-g06f75ff2cb
Author:     Jiri Denemark <jdenemar>
AuthorDate: Fri Jun 30 16:55:20 2017 +0200
Commit:     Jiri Denemark <jdenemar>
CommitDate: Thu Sep 21 15:27:39 2017 +0200

    qemu: Don't update CPU when formatting live def

    Since commit v2.2.0-199-g7ce711a30e libvirt stores an updated guest CPU
    in domain's live definition and there's no need to update it every time
    we want to format the definition. The commit itself tried to address
    this in qemuDomainFormatXML, but forgot to fix qemuDomainDefFormatLive.
    Not to mention that masking a previously set flag is only acceptable if
    the flag was set by a public API user. Internally, libvirt should have
    never set the flag in the first place.

    https://bugzilla.redhat.com/show_bug.cgi?id=1485022

    Signed-off-by: Jiri Denemark <jdenemar>

Comment 9 David Hill 2017-09-23 21:39:47 UTC
Would live migration also triger this bug?

Comment 10 Jiri Denemark 2017-09-25 07:48:04 UTC
No, this bug should not affect migration.

Ideally, file a new BZ and provide all relevant details, such as libvirt version on both hosts, the output of "virsh capabilities" and "virsh domcapabilities" from both hosts, the XML of the running domain on the source host, and debug logs from libvirtd on both sides covering the migration.

Comment 13 Han Han 2018-01-09 02:49:16 UTC
Bug reproduced on libvirt-3.2.0-14.el7_4.5 following steps from comment6

Verify it on libvirt-3.9.0-7.el7.x86_64 qemu-kvm-rhev-2.10.0-15.el7.x86_64:
1. Start a VM of machine type pc-i440fx-rhel7.4.0 with <cpu mode='host-model'/> on a host with CPU model name 'Intel(R) Xeon(R) CPU E5-2630 v3' 

2. Compare the result of cpu features between `virsh dumpxml V` and `virsh dumpxml V --inactive --update-cpu`
# diff <( virsh dumpxml V ) <( virsh dumpxml V --inactive --update-cpu )|less
19,20c16,17
<   <cpu mode='custom' match='exact' check='full'>
<     <model fallback='forbid'>Haswell-noTSX</model>
---
>   <cpu mode='custom' match='exact' check='partial'>
>     <model fallback='allow'>Haswell-noTSX</model>
...

No differences in features. 

# virsh dumpxml V --inactive --update-cpu|grep cmt

No cmt feature in the result.


Rerun with machine type pc-i440fx-rhel7.5.0:
# diff <( virsh dumpxml V ) <( virsh dumpxml V --inactive --update-cpu )|less
19,20c16,17
<   <cpu mode='custom' match='exact' check='full'>
<     <model fallback='forbid'>Haswell-noTSX</model>
---
>   <cpu mode='custom' match='exact' check='partial'>
>     <model fallback='allow'>Haswell-noTSX</model>

No differences in features.

# virsh dumpxml V --inactive --update-cpu|grep cmt

No cmt feature in the result.
Bug fixed.

Comment 18 errata-xmlrpc 2018-04-10 10:55:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:0704