Bug 1741456 - Image cannot be used after blockcommit snapshots to base image and destroy/start vm
Summary: Image cannot be used after blockcommit snapshots to base image and destroy/st...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux Advanced Virtualization
Classification: Red Hat
Component: libvirt
Version: 8.1
Hardware: x86_64
OS: Linux
high
high
Target Milestone: rc
: 8.0
Assignee: Michal Privoznik
QA Contact: yisun
URL:
Whiteboard:
Depends On:
Blocks: 1652078 1771501
TreeView+ depends on / blocked
 
Reported: 2019-08-15 08:09 UTC by yisun
Modified: 2020-11-06 04:44 UTC (History)
13 users (show)

Fixed In Version: libvirt-5.6.0-9.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1771501 (view as bug list)
Environment:
Last Closed: 2020-02-04 18:28:48 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2020:0404 0 None None None 2020-02-04 18:29:50 UTC

Description yisun 2019-08-15 08:09:58 UTC
Image cannot be used after blockcommit snapshots to base image and destroy/start vm

Versions:
libvirt-5.6.0-1.module+el8.1.0+3890+4d3d259c.x86_64
qemu-kvm-4.0.0-6.module+el8.1.0+3736+a2aefea3.x86_64

How reproducible:
100%

Scenario 1: Blockcommit from top to base with --active and without --pivot and restart vm
1. Having a running vm with a virtual disk = vda
$ virsh start avocado-vt-vm1
Domain avocado-vt-vm1 started

$ virsh domblklist avocado-vt-vm1
 Target   Source
----------------------------------------------------------------
 vda      /var/lib/libvirt/images/RHEL-8.1-x86_64-latest.qcow2

2. Create a snapshot for the vm
$ virsh snapshot-create-as avocado-vt-vm1 snap_1 --disk-only
Domain snapshot snap_1 created

3. Do blockcommit to merge the snapshot to base image
$ virsh blockcommit avocado-vt-vm1 vda --wait --verbose --active
Block commit: [100 %]
Now in synchronized phase

$ virsh blockjob avocado-vt-vm1 vda
Active Block Commit: [100 %]


4. Abort the block job ( this step is optional, can be skipped. Without this step, the vm will be restarted with a active block job)
$ virsh blockjob avocado-vt-vm1 vda --abort

$ virsh blockjob avocado-vt-vm1 vda
No current block job for vda

5. Restart the vm
$ virsh destroy avocado-vt-vm1; virsh start avocado-vt-vm1
Domain avocado-vt-vm1 destroyed

Domain avocado-vt-vm1 started

6. Now the base image cannot be used again. We can not do another blockcommit, or use it directly in this vm.
$ virsh blockcommit avocado-vt-vm1 vda --wait --verbose --active
error: internal error: child reported (status=125): Requested operation is not valid: Setting different SELinux label on /var/lib/libvirt/images/RHEL-8.1-x86_64-latest.qcow2 which is already in use


Scenario 2: Blockcommit form middle to base and restart vm
Following is the scenario used in our auto case.
1. Having a running vm with a virtual disk = vda
$ virsh start avocado-vt-vm1
Domain avocado-vt-vm1 started

$ virsh domblklist avocado-vt-vm1
 Target   Source
----------------------------------------------------------------
 vda      /var/lib/libvirt/images/RHEL-8.1-x86_64-latest.qcow2

2. Create 2 disk-only snapshots
$ for i in {snap_1,snap_2}; do virsh snapshot-create-as avocado-vt-vm1 $i --disk-only; done
Domain snapshot snap_1 created
Domain snapshot snap_2 created

3. Do blockcommit from middle image to base image
$ virsh blockcommit avocado-vt-vm1 vda --wait --verbose --top vda[1]
Block commit: [100 %]
Commit complete

4. Destroy and start the vm
$ virsh destroy avocado-vt-vm1; virsh start avocado-vt-vm1
Domain avocado-vt-vm1 destroyed

Domain avocado-vt-vm1 started

5.
5.1 Try to do a blockcommit to merge everything to base image.
$ virsh blockcommit avocado-vt-vm1 vda --wait --verbose --active
error: internal error: child reported (status=125): Requested operation is not valid: Setting different SELinux label on /var/lib/libvirt/images/RHEL-8.1-x86_64-latest.qcow2 which is already in use

5.2 virsh edit the vm to use /var/lib/libvirt/images/RHEL-8.1-x86_64-latest.qcow2 as source file of vda and start vm
$ virsh dumpxml avocado-vt-vm1 | awk '/<disk/,/<\/disk/'
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/var/lib/libvirt/images/RHEL-8.1-x86_64-latest.qcow2'/>
      <target dev='vda' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
    </disk>

$ virsh start avocado-vt-vm1
error: Failed to start domain avocado-vt-vm1
error: internal error: child reported (status=125): Requested operation is not valid: Setting different SELinux label on /var/lib/libvirt/images/RHEL-8.1-x86_64-latest.qcow2 which is already in use


Expected result:
Original img should have no trouble to be used in scenario1_step6 and scenario2_step5

Comment 2 Michal Privoznik 2019-08-30 13:20:06 UTC
Patches posted upstream:

https://www.redhat.com/archives/libvir-list/2019-August/msg01418.html

Comment 4 Michal Privoznik 2019-09-02 09:31:38 UTC
Fixed upstream as:

16fb3c8b83 qemu_blockjob: Remove secdriver metadata more frequently
7f99d8a739 qemu_blockjob: Print image path on failed security metadata move too
143a0f8b05 qemu_blockjob: Move active commit failed state handling into a function

v5.7.0-rc2

Comment 8 yisun 2019-09-05 10:11:59 UTC
The comment 0 scenario 2 is still failing, and this will make the base image /var/lib/libvirt/images/jeos-27-x86_64.qcow2 dirty, which will cause more failures for followed up cases:
Changed back to ASSIGNED for now.

(.libvirt-ci-venv-ci-runtest-ovFFGP) [root@dell-per730-62 ~]# virsh start avocado-vt-vm1
'Domain avocado-vt-vm1 started

(.libvirt-ci-venv-ci-runtest-ovFFGP) [root@dell-per730-62 ~]# for i in {snap_1,snap_2}; do virsh snapshot-create-as avocado-vt-vm1 $i --disk-only; done
Domain snapshot snap_1 created
Domain snapshot snap_2 created
(.libvirt-ci-venv-ci-runtest-ovFFGP) [root@dell-per730-62 ~]# virsh blockcommit avocado-vt-vm1 vda --wait --verbose --top vda[1]
Block commit: [100 %]
Commit complete
(.libvirt-ci-venv-ci-runtest-ovFFGP) [root@dell-per730-62 ~]# virsh destroy avocado-vt-vm1; virsh start avocado-vt-vm1
Domain avocado-vt-vm1 destroyed

Domain avocado-vt-vm1 started

(.libvirt-ci-venv-ci-runtest-ovFFGP) [root@dell-per730-62 ~]# virsh blockcommit avocado-vt-vm1 vda --wait --verbose --active
error: internal error: child reported (status=125): Requested operation is not valid: Setting different SELinux label on /var/lib/libvirt/images/jeos-27-x86_64.qcow2 which is already in use

Comment 9 Michal Privoznik 2019-09-16 06:50:17 UTC
Patch proposed upstream:

https://www.redhat.com/archives/libvir-list/2019-September/msg00600.html

Comment 10 Michal Privoznik 2019-09-16 10:35:04 UTC
Another approach implemented (as requested in review):

https://www.redhat.com/archives/libvir-list/2019-September/msg00621.html

Comment 12 Michal Privoznik 2019-09-25 11:52:03 UTC
I've just pushed the fix upstream and backported it:

http://post-office.corp.redhat.com/archives/rhvirt-patches/2019-September/msg01083.html

There's also a scratch build with this patch applied:

https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=23705370

It also contains patches I've proposed for bug 1740024 (but those are not reviewed upstream yet).

Comment 19 yisun 2019-11-15 09:00:37 UTC
Hi Michal, 
The scenario 2 in comment 0 is still reproducible, pls have a check. And folloiwng is a simpler way to reproduce it:

1. having a running vm
(.libvirt-ci-venv-ci-runtest-tpP3NB) [root@ibm-x3850x6-03 src]# virsh domblklist avocado-vt-vm1
 Target   Source
------------------------------------------------------------------------
 vda      /var/lib/avocado/data/avocado-vt/images/jeos-27-x86_64.qcow2

(.libvirt-ci-venv-ci-runtest-tpP3NB) [root@ibm-x3850x6-03 src]# virsh domstate avocado-vt-vm1
running

(.libvirt-ci-venv-ci-runtest-tpP3NB) [root@ibm-x3850x6-03 src]# getfattr -m trusted.libvirt.security -d /var/lib/avocado/data/avocado-vt/images/jeos-27-x86_64.qcow2
<==== nothing now

2. create some external snapshots for it
(.libvirt-ci-venv-ci-runtest-tpP3NB) [root@ibm-x3850x6-03 src]# for i in {1..2}; do virsh snapshot-create-as avocado-vt-vm1 snap_$i snap1-desc --disk-only; done
Domain snapshot snap_1 created
Domain snapshot snap_2 created

3. do blockcommit WITHOUT --pivot
(.libvirt-ci-venv-ci-runtest-tpP3NB) [root@ibm-x3850x6-03 src]# virsh blockcommit avocado-vt-vm1 vda --wait --verbose --active
Block commit: [100 %]
Now in synchronized phase

4. now the image file having extended attrs as follow:
(.libvirt-ci-venv-ci-runtest-tpP3NB) [root@ibm-x3850x6-03 src]# getfattr -m trusted.libvirt.security -d /var/lib/avocado/data/avocado-vt/images/jeos-27-x86_64.qcow2
getfattr: Removing leading '/' from absolute path names
# file: var/lib/avocado/data/avocado-vt/images/jeos-27-x86_64.qcow2
trusted.libvirt.security.dac="+107:+107"
trusted.libvirt.security.ref_dac="1"
trusted.libvirt.security.ref_selinux="1"
trusted.libvirt.security.selinux="system_u:object_r:svirt_image_t:s0:c229,c326"
trusted.libvirt.security.timestamp_dac="1573791864"
trusted.libvirt.security.timestamp_selinux="1573791864"

5. destroy the vm
(.libvirt-ci-venv-ci-runtest-tpP3NB) [root@ibm-x3850x6-03 src]# virsh destroy avocado-vt-vm1
Domain avocado-vt-vm1 destroyed

6. even the vm stopped, the file's xattrs still exsting, and if we "virsh edit $VM" to use the original image again, vm cannot be started.
(.libvirt-ci-venv-ci-runtest-tpP3NB) [root@ibm-x3850x6-03 src]# getfattr -m trusted.libvirt.security -d /var/lib/avocado/data/avocado-vt/images/jeos-27-x86_64.qcow2
getfattr: Removing leading '/' from absolute path names
# file: var/lib/avocado/data/avocado-vt/images/jeos-27-x86_64.qcow2
trusted.libvirt.security.dac="+107:+107"
trusted.libvirt.security.ref_dac="1"
trusted.libvirt.security.ref_selinux="1"
trusted.libvirt.security.selinux="system_u:object_r:svirt_image_t:s0:c229,c326"
trusted.libvirt.security.timestamp_dac="1573791864"
trusted.libvirt.security.timestamp_selinux="1573791864"

Comment 20 yisun 2019-11-15 09:02:08 UTC
due to above comment, I'll set this back to ASSIGNED for now, and for automation scripts, I've submitted a PR to avoid other cases blocked if failure happened

https://github.com/autotest/tp-libvirt/pull/2430

Comment 22 Michal Privoznik 2019-11-19 09:15:07 UTC
Patches proposed upstream for the issue mention in comment 19:

https://www.redhat.com/archives/libvir-list/2019-November/msg00851.html

Comment 23 Michal Privoznik 2019-11-22 09:56:34 UTC
Pushed upstream:

8fa0374c5b qemuProcessStop: Remove image metadata for running mirror jobs
1c12b86185 qemu: Separate image metadata removal into a function

Comment 27 errata-xmlrpc 2020-02-04 18:28:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0404


Note You need to log in before you can comment on or make changes to this bug.