Bug 1722348 - libvirtd does not update VM .xml configurations on filesystem after virsh snapshot/blockcommit
Summary: libvirtd does not update VM .xml configurations on filesystem after virsh sna...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: libvirt
Version: 30
Hardware: All
OS: Linux
unspecified
urgent
Target Milestone: ---
Assignee: Libvirt Maintainers
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-06-20 06:49 UTC by Saso Tavcar
Modified: 2019-07-09 00:55 UTC (History)
11 users (show)

Fixed In Version: libvirt-5.1.0-9.fc30
Clone Of:
Environment:
Last Closed: 2019-07-09 00:55:09 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Saso Tavcar 2019-06-20 06:49:52 UTC
Description of problem:

Recently we have upgraded some KVM hosts from Fedora 29 to Fedora 30 and
now experience broken VM configurations on filesystem after virsh snapshot/blockcommit.

Commands "virsh dumpxml ..." and "virsh dumpxml --inactive ..." is showing diffrent configuration than the one on filesystem.
In case of libvirtd restart or system reboot, there are broken VM xml configurations on filesystem (/etc/libvirt/qemu).

Everything is OK on Fedora 29 KVM hosts!


Version-Release number of selected component (if applicable):

- Fedora 29 has libvirt 4.7.0 and qemu 3.0.1:

[root solaris1 ~]# rpm -qa |grep libvirt-daemon-kvm
libvirt-daemon-kvm-4.7.0-3.fc29.x86_64

[root solaris1 ~]# rpm -qa |grep qemu-system-x86
qemu-system-x86-core-3.0.1-3.fc29.x86_64
qemu-system-x86-3.0.1-3.fc29.x86_64

- Fedora 30 has libvirt 5.1.0 and qemu 3.1.0:

[root server1 ~]# rpm -qa |grep libvirt-daemon-kvm
libvirt-daemon-kvm-5.1.0-8.fc30.x86_64

[root server1 ~]# rpm -qa |grep qemu-system-x86
qemu-system-x86-3.1.0-8.fc30.x86_64
qemu-system-x86-core-3.1.0-8.fc30.x86_64


How reproducible:

/usr/bin/virsh --quiet  snapshot-create-as --domain somedomain.com.ncloud somedomain.com.ncloud-SNAPSHOT  --diskspec sda,file=/Virtualization/linux/somedomain.com/somedomain.com.ncloud.qcow2 --disk-only --atomic --quiesce

/usr/bin/virsh --quiet  blockcommit somedomain.com.ncloud sda --active --pivot

/usr/bin/virsh --quiet  snapshot-delete --domain somedomain.com.ncloud somedomain.com.ncloud-SNAPSHOT --metadata


Steps to Reproduce:

For every VM from "virsh list" we do following steps (in script) for VM backup:

/usr/bin/virsh --quiet  domblklist somedomain.com.ncloud
/usr/bin/virsh --quiet  dumpxml --inactive somedomain.com.ncloud > /Backuping/VMs/Daily/somedomain.com.ncloud.xml

/usr/bin/virsh --quiet  snapshot-create-as --domain somedomain.com.ncloud somedomain.com.ncloud-SNAPSHOT  --diskspec sda,file=/Virtualization/linux/somedomain.com/somedomain.com.ncloud.qcow2-BACKUPING_NOW --diskspec --disk-only --atomic --quiesce

/usr/bin/virsh --quiet  snapshot-list somedomain.com.ncloud 

(/usr/bin/scp -p server1.somedomain.us:/Virtualization/linux/somedomain.com/somedomain.com.ncloud.qcow2 somedomain.com.ncloud.qcow2)


/usr/bin/virsh --quiet  blockcommit somedomain.com.ncloud sda --active --pivot

/usr/bin/virsh --quiet  snapshot-delete --domain somedomain.com.ncloud somedomain.com.ncloud-SNAPSHOT --metadata

/usr/bin/ssh server1.somedomain.us "/usr/bin/rm /Virtualization/linux/somedomain.com/somedomain.com.ncloud.qcow2-BACKUPING_NOW"


Actual results:

When VM backup is done, data merged with virsh backcommit ..." and snapshot deleted VM configurations on filesystem (/etc/libvirt/qemu) are broken:

...
/usr/bin/virsh --quiet  blockcommit somedomain.com.ncloud sdd --active --pivot
/usr/bin/virsh --quiet  snapshot-delete --domain somedomain.com.ncloud somedomain.com.ncloud-SNAPSHOT --metadata


there is a following state of VM configurations:


- active (virsh dumpxml),

[root server1 ~]# /usr/bin/virsh --quiet  dumpxml  somedomain.com.ncloud|grep BACK ;;; OK

[root server1 ~]# /usr/bin/virsh --quiet  dumpxml  somedomain.com.ncloud|grep backingStore ;;; why is there empty backingStore left ???
     <backingStore/>
     <backingStore/>
     <backingStore/>
     <backingStore/>

- inactive

[root server1 qemu]# virsh dumpxml --inactive somedomain.com.ncloud |grep BACK          ;;; OK
[root server1 qemu]# virsh dumpxml --inactive somedomain.com.ncloud |grep backingStore  ;;; OK


- XML on filesystem (.xml file on filesystem has not changed/reverted since snapshot has been taken - NOT OK!, should be cleared of snapshot source file and backingStore)

[root server1 ~]# ls -al /etc/libvirt/qemu/somedomain.com.ncloud.xml
-rw-------. 1 root root 6260 Jun 18 23:00 /etc/libvirt/qemu/somedomain.com.ncloud.xml

[root server1 qemu]# cat somedomain.com.ns2.xml |grep BACK
     <source file='/Virtualization/linux/somedomain.com/somedomain.com.ns2.qcow2-BACKUPING_NOW'/>

[root server1 qemu]# cat somedomain.com.ns2.xml |grep backingStore
     <backingStore type='file' index='1'>
     </backingStore>


Expected results:


XML configurations on filesystem should be as before snapshot is taken (all good, nothing found):

[root server1 ~]# cat /etc/libvirt/qemu/somedomain.com.ncloud.xml| grep BACK
[root server1 ~]# cat /etc/libvirt/qemu/somedomain.com.ncloud.xml| grep backingStore
[root server1 ~]# /usr/bin/virsh --quiet  dumpxml somedomain.com.ncloud|grep BACK
[root server1 ~]# /usr/bin/virsh --quiet  dumpxml somedomain.com.ncloud|grep backingStore
[root server1 ~]# /usr/bin/virsh --quiet  dumpxml --inactive somedomain.com.ncloud|grep BACK
[root server1 ~]# /usr/bin/virsh --quiet  dumpxml --inactive somedomain.com.ncloud|grep backingStore


Additional info:

Comment 1 Peter Krempa 2019-06-20 07:07:55 UTC
This is a known problem which was already fixed upstream:

commit 4d8cc5a07a0dcc0ac99377f66a4649d219705452
Author: Peter Krempa <pkrempa>
Date:   Fri May 17 10:15:53 2019 +0200

    qemu: blockjob: Fix saving of inactive XML after completed legacy blockjob
    
    Commit c257352797 introduced a logic bug where we will never save the
    inactive XML after a blockjob as the variable which was determining
    whether to do so is cleared right before. Thus even if we correctly
    modify the inactive state it will be rolled back when libvirtd is
    restarted.

It was broken in libvirt 5.1.0 and fixed in 5.4.0.

Moving to POST to notify that fix can be picked up. Alternatively you can try using the virt-preview repo [1] which should contain 5.4.0 to test it, but that version might not work properly with selinux due to a different bug which will be fixed in 5.5.0 (setting selinux to permissive works it around)

https://fedoraproject.org/wiki/Virtualization_Preview_Repository

Comment 2 Saso Tavcar 2019-06-20 10:47:51 UTC
Thank you.

I've rebuild/recompiled SRPM package 

https://libvirt.org/sources/libvirt-5.4.0-1.fc28.src.rpm 

All issues are now resolved.

Comment 3 Fedora Update System 2019-06-20 17:41:32 UTC
FEDORA-2019-b2dfb13daf has been submitted as an update to Fedora 30. https://bodhi.fedoraproject.org/updates/FEDORA-2019-b2dfb13daf

Comment 4 Fedora Update System 2019-06-22 06:04:16 UTC
libvirt-5.1.0-9.fc30 has been pushed to the Fedora 30 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-b2dfb13daf

Comment 5 Fedora Update System 2019-07-09 00:55:09 UTC
libvirt-5.1.0-9.fc30 has been pushed to the Fedora 30 stable repository. If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.