1781079 – [blockdev enablement]VM cannot be started if we took external sanpshot and destroyed it

Bug 1781079 - [blockdev enablement]VM cannot be started if we took external sanpshot and destroyed it

Summary: [blockdev enablement]VM cannot be started if we took external sanpshot and de...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux Advanced Virtualization
Classification:	Red Hat
Component:	libvirt
Sub Component:
Version:	8.2
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	8.0
Assignee:	Peter Krempa
QA Contact:	yisun
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-12-09 10:08 UTC by yisun
Modified:	2020-11-06 04:09 UTC (History)
CC List:	6 users (show)
Fixed In Version:	libvirt-6.0.0-1.el8
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-05-05 09:52:05 UTC
Type:	Bug
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
libvirtd.log (1.12 MB, text/plain) 2019-12-09 10:08 UTC, yisun	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2020:2017	0	None	None	None	2020-05-05 09:54:15 UTC

Description yisun 2019-12-09 10:08:30 UTC

Created attachment 1643233 [details]
libvirtd.log

Description:
[blockdev enablement]VM cannot be started if we took external sanpshot and destroyed it

Versions:
libvirt-5.10.0-1.module+el8.2.0+5040+bd433686.x86_64
qemu-kvm-4.2.0-1.module+el8.2.0+4793+b09dd2fb.x86_64

How reproducible:
100%

Pls note:
This bz may be treated as a downstream clone of bz1762178


Steps:
1. Having a vm with a virtual disk
(.libvirt-ci-venv-ci-runtest-jUzTYn) [root@libvirt-rhel-8 domain]# virsh start avocado-vt-vm1
Domain avocado-vt-vm1 started

(.libvirt-ci-venv-ci-runtest-jUzTYn) [root@libvirt-rhel-8 domain]# virsh domblklist avocado-vt-vm1
 Target   Source
------------------------------------------------------------------------
 vda      /var/lib/avocado/data/avocado-vt/images/jeos-27-x86_64.qcow2

(.libvirt-ci-venv-ci-runtest-jUzTYn) [root@libvirt-rhel-8 domain]# virsh dumpxml avocado-vt-vm1 | awk '/<disk/,/<\/disk/'
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/var/lib/avocado/data/avocado-vt/images/jeos-27-x86_64.qcow2' index='1'/>
      <backingStore/>
      <target dev='vda' bus='virtio'/>
      <alias name='virtio-disk0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </disk>

2. Create a external snapshot for it
(.libvirt-ci-venv-ci-runtest-jUzTYn) [root@libvirt-rhel-8 domain]# virsh snapshot-create-as avocado-vt-vm1 snap1 --disk-only
Domain snapshot snap1 created

3. Clear libvirtd log
(.libvirt-ci-venv-ci-runtest-jUzTYn) [root@libvirt-rhel-8 domain]# echo "" > /var/log/libvirtd-debug.log

4. Destroy the vm
(.libvirt-ci-venv-ci-runtest-jUzTYn) [root@libvirt-rhel-8 domain]# virsh destroy avocado-vt-vm1
Domain avocado-vt-vm1 destroyed

5.Start the vm, failure happens
(.libvirt-ci-venv-ci-runtest-jUzTYn) [root@libvirt-rhel-8 domain]# virsh start avocado-vt-vm1
error: Failed to start domain avocado-vt-vm1
error: internal error: process exited while connecting to monitor: 2019-12-09T09:53:43.135136Z qemu-kvm: -blockdev {"node-name":"libvirt-2-format","read-only":false,"driver":"qcow2","file":"libvirt-2-storage","backing":null}: Could not reopen file: Permission denied

(.libvirt-ci-venv-ci-runtest-jUzTYn) [root@libvirt-rhel-8 domain]# virsh dumpxml avocado-vt-vm1 | awk '/<disk/,/<\/disk/'
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/var/lib/avocado/data/avocado-vt/images/jeos-27-x86_64.snap1'/>
      <backingStore type='file'>
        <format type='qcow2'/>
        <source file='/var/lib/avocado/data/avocado-vt/images/jeos-27-x86_64.qcow2'/>
      </backingStore>
      <target dev='vda' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </disk>

6. If we try to start the vm again, it started normally:
(.libvirt-ci-venv-ci-runtest-jUzTYn) [root@libvirt-rhel-8 domain]# virsh start avocado-vt-vm1
Domain avocado-vt-vm1 started

Expect result:
No failure in step 5

Additional info:
Libvirtd log uploaded as attachment

Comment 2 Peter Krempa 2019-12-09 11:50:13 UTC

The problem is that libvirt didn't set the original image struct to read-only when creating the snapshot and the generated commandline then contained readonly set to 'false' in the qcow2 layer's blockdev parameters:

-blockdev '{"driver":"file","filename":"/var/lib/avocado/data/avocado-vt/images/jeos-27-x86_64.qcow2","node-name":"libvirt-2-storage","auto-read-only":true,"discard":"unmap"}' 
-blockdev '{"node-name":"libvirt-2-format","read-only":false,"driver":"qcow2","file":"libvirt-2-storage","backing":null}' 

Since the backing file's access is restricted by selinux, qemu failed opening it in read-write mode.

Comment 3 Peter Krempa 2019-12-09 13:16:49 UTC

Fixed upstream:

commit 6f6a1763a1c227b7b5d92ec813c02ce1b26b10a2 
Author: Peter Krempa <pkrempa>
Date:   Mon Dec 9 12:44:41 2019 +0100

    qemu: snapshot: Mark file becoming backingStore as read-only
    
    For any backing file we set 'read-only' to true, but didn't do this when
    modifying the recorded backing store when creating external snapshots.
    
    This meant that qemu would attempt to open the backing-file read-write.
    This would fail for example when selinux is used as qemu doesn't have
    write permission for the backing file.

v5.10.0-118-g6f6a1763a1

Comment 5 yisun 2020-01-19 12:47:17 UTC

Hi Peter, 
The auto case still failed with latest libvirt, with same steps, could you pls check if code lost when libvirt rebased? thx. Put back to Assigned for now.
(.libvirt-ci-venv-ci-runtest-jcaFve) [root@dell-per730-67 ~]# rpm -qa | grep libvirt-6
libvirt-6.0.0-1.module+el8.2.0+5453+31b2b136.x86_64
python3-libvirt-6.0.0-1.module+el8.2.0+5453+31b2b136.x86_64


1. Auto case failed
(.libvirt-ci-venv-ci-runtest-jcaFve) [root@dell-per730-67 ~]# avocado run --vt-type libvirt blockcommit.normal_test.multiple_chain.file_disk.local.no_ga.notimeout.shallow.top_active.without_pivot
JOB ID     : 894d75587a6368792dcb24bfbe444f1a97f2393d
JOB LOG    : /root/avocado/job-results/job-2020-01-19T07.37-894d755/job.log
 (1/1) type_specific.io-github-autotest-libvirt.virsh.blockcommit.normal_test.multiple_chain.file_disk.local.no_ga.notimeout.shallow.top_active.without_pivot: ERROR: VM 'avocado-vt-vm1' failed to start: error: Failed to start domain avocado-vt-vm1\nerror: internal error: process exited while connecting to monitor: 2020-01-19T12:38:21.857667Z qemu-kvm: -blockdev {"node-name":"libvirt-1-format","read-only":false,"driver"... (24.66 s)
RESULTS    : PASS 0 | ERROR 1 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0
JOB TIME   : 26.12 s

2. manually reproduced:
# virsh snapshot-create-as avocado-vt-vm1 snap1 --disk-only
Domain snapshot snap1 created

# virsh destroy avocado-vt-vm1
Domain avocado-vt-vm1 destroyed

# virsh start avocado-vt-vm1
error: Failed to start domain avocado-vt-vm1
error: internal error: process exited while connecting to monitor: 2020-01-19T12:42:45.179232Z qemu-kvm: -blockdev {"node-name":"libvirt-1-format","read-only":false,"driver":"qcow2","file":"libvirt-1-storage","backing":"libvirt-2-format"}: Could not reopen file: Permission denied

Comment 6 Peter Krempa 2020-01-20 13:14:34 UTC

I can't reproduce this issue. Could you please attach debug logs and the domain XML before and after the snapshot please?

Comment 7 yisun 2020-01-21 07:21:29 UTC

(In reply to Peter Krempa from comment #6)
> I can't reproduce this issue. Could you please attach debug logs and the
> domain XML before and after the snapshot please?

hmm, thx for the tips and finally find out why the auto case still failed.
The script created 3 snapshots: snap3 -> snap2 -> snap1 -> base_image
and then script removed the vm's disk xml and prepare a new disk xml based on the removed part. But when prepare the new disk xml, the script misused snap1 as top snapshot. And since vm's inactive xml has full backing chain info now, the script prepare a weird disk xml as follow:
<disk device="disk" type="file">
      <driver name="qemu" type="qcow2" />
      <source file="/var/tmp/avocado_ownj7rss/jeos-27-x86_64.snap1" />
      <backingStore type="file">
        <format type="qcow2" />
        <source file="/var/tmp/avocado_ownj7rss/jeos-27-x86_64.snap2" />
        <backingStore type="file">
          <format type="qcow2" />
          <source file="/var/tmp/avocado_ownj7rss/jeos-27-x86_64.snap1" />
          <backingStore type="file">
            <format type="qcow2" />
            <source file="/var/lib/avocado/data/avocado-vt/images/jeos-27-x86_64.qcow2" />
            <backingStore />
          </backingStore>
        </backingStore>
      </backingStore>
      <target bus="virtio" dev="vda" />
      <address bus="0x04" domain="0x0000" function="0x0" slot="0x00" type="pci" />
    </disk>
which is snap1 -> snap2 -> snap1 -> base_img now. Here the snap1 used twice in the chain, and problem happens.
So this shoud be fixed in auto script, sorry for the false alarm since the error message is so similar.
Will close this bug as verified.

Comment 8 Peter Krempa 2020-01-21 07:24:58 UTC

Thank you for looking into it! I wouldn't be able to figure out what's happening in the test suite so quickly.

Comment 10 errata-xmlrpc 2020-05-05 09:52:05 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2017

Note You need to log in before you can comment on or make changes to this bug.