Created attachment 1643233 [details] libvirtd.log Description: [blockdev enablement]VM cannot be started if we took external sanpshot and destroyed it Versions: libvirt-5.10.0-1.module+el8.2.0+5040+bd433686.x86_64 qemu-kvm-4.2.0-1.module+el8.2.0+4793+b09dd2fb.x86_64 How reproducible: 100% Pls note: This bz may be treated as a downstream clone of bz1762178 Steps: 1. Having a vm with a virtual disk (.libvirt-ci-venv-ci-runtest-jUzTYn) [root@libvirt-rhel-8 domain]# virsh start avocado-vt-vm1 Domain avocado-vt-vm1 started (.libvirt-ci-venv-ci-runtest-jUzTYn) [root@libvirt-rhel-8 domain]# virsh domblklist avocado-vt-vm1 Target Source ------------------------------------------------------------------------ vda /var/lib/avocado/data/avocado-vt/images/jeos-27-x86_64.qcow2 (.libvirt-ci-venv-ci-runtest-jUzTYn) [root@libvirt-rhel-8 domain]# virsh dumpxml avocado-vt-vm1 | awk '/<disk/,/<\/disk/' <disk type='file' device='disk'> <driver name='qemu' type='qcow2'/> <source file='/var/lib/avocado/data/avocado-vt/images/jeos-27-x86_64.qcow2' index='1'/> <backingStore/> <target dev='vda' bus='virtio'/> <alias name='virtio-disk0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> </disk> 2. Create a external snapshot for it (.libvirt-ci-venv-ci-runtest-jUzTYn) [root@libvirt-rhel-8 domain]# virsh snapshot-create-as avocado-vt-vm1 snap1 --disk-only Domain snapshot snap1 created 3. Clear libvirtd log (.libvirt-ci-venv-ci-runtest-jUzTYn) [root@libvirt-rhel-8 domain]# echo "" > /var/log/libvirtd-debug.log 4. Destroy the vm (.libvirt-ci-venv-ci-runtest-jUzTYn) [root@libvirt-rhel-8 domain]# virsh destroy avocado-vt-vm1 Domain avocado-vt-vm1 destroyed 5.Start the vm, failure happens (.libvirt-ci-venv-ci-runtest-jUzTYn) [root@libvirt-rhel-8 domain]# virsh start avocado-vt-vm1 error: Failed to start domain avocado-vt-vm1 error: internal error: process exited while connecting to monitor: 2019-12-09T09:53:43.135136Z qemu-kvm: -blockdev {"node-name":"libvirt-2-format","read-only":false,"driver":"qcow2","file":"libvirt-2-storage","backing":null}: Could not reopen file: Permission denied (.libvirt-ci-venv-ci-runtest-jUzTYn) [root@libvirt-rhel-8 domain]# virsh dumpxml avocado-vt-vm1 | awk '/<disk/,/<\/disk/' <disk type='file' device='disk'> <driver name='qemu' type='qcow2'/> <source file='/var/lib/avocado/data/avocado-vt/images/jeos-27-x86_64.snap1'/> <backingStore type='file'> <format type='qcow2'/> <source file='/var/lib/avocado/data/avocado-vt/images/jeos-27-x86_64.qcow2'/> </backingStore> <target dev='vda' bus='virtio'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> </disk> 6. If we try to start the vm again, it started normally: (.libvirt-ci-venv-ci-runtest-jUzTYn) [root@libvirt-rhel-8 domain]# virsh start avocado-vt-vm1 Domain avocado-vt-vm1 started Expect result: No failure in step 5 Additional info: Libvirtd log uploaded as attachment
The problem is that libvirt didn't set the original image struct to read-only when creating the snapshot and the generated commandline then contained readonly set to 'false' in the qcow2 layer's blockdev parameters: -blockdev '{"driver":"file","filename":"/var/lib/avocado/data/avocado-vt/images/jeos-27-x86_64.qcow2","node-name":"libvirt-2-storage","auto-read-only":true,"discard":"unmap"}' -blockdev '{"node-name":"libvirt-2-format","read-only":false,"driver":"qcow2","file":"libvirt-2-storage","backing":null}' Since the backing file's access is restricted by selinux, qemu failed opening it in read-write mode.
Fixed upstream: commit 6f6a1763a1c227b7b5d92ec813c02ce1b26b10a2 Author: Peter Krempa <pkrempa> Date: Mon Dec 9 12:44:41 2019 +0100 qemu: snapshot: Mark file becoming backingStore as read-only For any backing file we set 'read-only' to true, but didn't do this when modifying the recorded backing store when creating external snapshots. This meant that qemu would attempt to open the backing-file read-write. This would fail for example when selinux is used as qemu doesn't have write permission for the backing file. v5.10.0-118-g6f6a1763a1
Hi Peter, The auto case still failed with latest libvirt, with same steps, could you pls check if code lost when libvirt rebased? thx. Put back to Assigned for now. (.libvirt-ci-venv-ci-runtest-jcaFve) [root@dell-per730-67 ~]# rpm -qa | grep libvirt-6 libvirt-6.0.0-1.module+el8.2.0+5453+31b2b136.x86_64 python3-libvirt-6.0.0-1.module+el8.2.0+5453+31b2b136.x86_64 1. Auto case failed (.libvirt-ci-venv-ci-runtest-jcaFve) [root@dell-per730-67 ~]# avocado run --vt-type libvirt blockcommit.normal_test.multiple_chain.file_disk.local.no_ga.notimeout.shallow.top_active.without_pivot JOB ID : 894d75587a6368792dcb24bfbe444f1a97f2393d JOB LOG : /root/avocado/job-results/job-2020-01-19T07.37-894d755/job.log (1/1) type_specific.io-github-autotest-libvirt.virsh.blockcommit.normal_test.multiple_chain.file_disk.local.no_ga.notimeout.shallow.top_active.without_pivot: ERROR: VM 'avocado-vt-vm1' failed to start: error: Failed to start domain avocado-vt-vm1\nerror: internal error: process exited while connecting to monitor: 2020-01-19T12:38:21.857667Z qemu-kvm: -blockdev {"node-name":"libvirt-1-format","read-only":false,"driver"... (24.66 s) RESULTS : PASS 0 | ERROR 1 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0 JOB TIME : 26.12 s 2. manually reproduced: # virsh snapshot-create-as avocado-vt-vm1 snap1 --disk-only Domain snapshot snap1 created # virsh destroy avocado-vt-vm1 Domain avocado-vt-vm1 destroyed # virsh start avocado-vt-vm1 error: Failed to start domain avocado-vt-vm1 error: internal error: process exited while connecting to monitor: 2020-01-19T12:42:45.179232Z qemu-kvm: -blockdev {"node-name":"libvirt-1-format","read-only":false,"driver":"qcow2","file":"libvirt-1-storage","backing":"libvirt-2-format"}: Could not reopen file: Permission denied
I can't reproduce this issue. Could you please attach debug logs and the domain XML before and after the snapshot please?
(In reply to Peter Krempa from comment #6) > I can't reproduce this issue. Could you please attach debug logs and the > domain XML before and after the snapshot please? hmm, thx for the tips and finally find out why the auto case still failed. The script created 3 snapshots: snap3 -> snap2 -> snap1 -> base_image and then script removed the vm's disk xml and prepare a new disk xml based on the removed part. But when prepare the new disk xml, the script misused snap1 as top snapshot. And since vm's inactive xml has full backing chain info now, the script prepare a weird disk xml as follow: <disk device="disk" type="file"> <driver name="qemu" type="qcow2" /> <source file="/var/tmp/avocado_ownj7rss/jeos-27-x86_64.snap1" /> <backingStore type="file"> <format type="qcow2" /> <source file="/var/tmp/avocado_ownj7rss/jeos-27-x86_64.snap2" /> <backingStore type="file"> <format type="qcow2" /> <source file="/var/tmp/avocado_ownj7rss/jeos-27-x86_64.snap1" /> <backingStore type="file"> <format type="qcow2" /> <source file="/var/lib/avocado/data/avocado-vt/images/jeos-27-x86_64.qcow2" /> <backingStore /> </backingStore> </backingStore> </backingStore> <target bus="virtio" dev="vda" /> <address bus="0x04" domain="0x0000" function="0x0" slot="0x00" type="pci" /> </disk> which is snap1 -> snap2 -> snap1 -> base_img now. Here the snap1 used twice in the chain, and problem happens. So this shoud be fixed in auto script, sorry for the false alarm since the error message is so similar. Will close this bug as verified.
Thank you for looking into it! I wouldn't be able to figure out what's happening in the test suite so quickly.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2017