Bug 2057067
Summary: | `virsh blockjob --abort' logs error when cancelling a copy job started with '--reuse-external --shallow', where the target image has a backing file | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 9 | Reporter: | Kashyap Chamarthy <kchamart> |
Component: | libvirt | Assignee: | Peter Krempa <pkrempa> |
libvirt sub component: | Storage | QA Contact: | Meina Li <meili> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | low | ||
Priority: | low | CC: | astupnik, chhu, dzheng, jdenemar, lmen, nanli, pkrempa, virt-maint, xuzhang |
Version: | 9.0 | Keywords: | Triaged |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | libvirt-8.1.0-1.el9 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2022-11-15 10:03:40 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | 8.1.0 |
Embargoed: |
Description
Kashyap Chamarthy
2022-02-22 16:55:05 UTC
My original assumption was that the aborting of the block job actually propagates the error, but at the point where it happens we no longer propagate it to the caller, so the error is only a log entry. The cancellation of the block job was actually successful, and the error is spurious because the image was not actually inserted. Thus it can be safely ignored until libvirt is fixed. The actual problems described in the launchpad issue are actually caused by qemu crashing and have nothing to do with the block job cancellation reporting errors. To reproduce the issue the following steps are necessary: 1) create a VM with a disk image which has at least one backing image, or create a snapshot. E.g.: <disk type='file' device='disk'> <driver name='qemu' type='qcow2'/> <source file='/tmp/img.qcow2' index='1'/> <backingStore type='file' index='5'> <format type='qcow2'/> <source file='/tmp/copybase.qcow2'/> <backingStore/> </backingStore> <target dev='hdd' bus='ide'/> <alias name='ide0-1-1'/> <address type='drive' controller='0' bus='1' target='0' unit='1'/> </disk> 2) create the destination images: cp /tmp/copybase.qcow2 /tmp/copycopy.qcow2 qemu-img create -f qcow2 -F qcow2 -b /tmp/copycopy.qcow2 /tmp/copy.qcow2 (no need to actually copy the original image, you can create a dummy one, the data will not be consistent, but we are going to cancel the job anyways) 3) start the copy job virsh blockcopy $VM --path $DISKTARGET --dest /tmp/copy.qcow2 --reuse-external --shallow --transient-job 4) abor the blockjob virsh blockjob --abort $VM $DISKTARGET The log file will have the error mentioned in the description. Fixed upstream: commit 14851cff117a5cb77f0543f0ca5b72d10b83b8e5 Author: Peter Krempa <pkrempa> Date: Tue Feb 22 17:34:46 2022 +0100 qemu: blockjob: Avoid spurious log errors when cancelling a shallow copy with reused images In case when a user starts a block copy operation with VIR_DOMAIN_BLOCK_COPY_SHALLOW and VIR_DOMAIN_BLOCK_COPY_REUSE_EXT and both the reused image and the original disk have a backing image libvirt specifically does not insert the backing image until after the job is asked to be completed via virBlockJobAbort with VIR_DOMAIN_BLOCK_JOB_ABORT_PIVOT. This is so that management applications can copy the backing image on the background. Now when a user aborts the block job instead of cancelling it we'd ignore the fact that we didn't insert the backing image yet and the cancellation would result into a 'blockdev-del' of a invalid node name and thus an 'error' severity entry in the log. To solve this issue we use the same conditions when the backing image addition is avoided to remove the internal state for them prior to the call to unplug the mirror destination. Reported-by: Kashyap Chamarthy <kchamart> Signed-off-by: Peter Krempa <pkrempa> Reviewed-by: Ján Tomko <jtomko> v8.0.0-469-g14851cff11 Reprocuded version: libvirt-8.0.0-5.el9.x86_64 qemu-kvm-6.2.0-10.el9.x86_64 Reproduced Steps: 1. Prepare a running guest. # virsh domstate lmn running 2. Create snapshot for the guest. # virsh snapshot-create-as lmn s1 --disk-only Domain snapshot s1 created # virsh dumpxml lmn | grep /disk -B10 ...... <disk type='file' device='disk'> <driver name='qemu' type='qcow2'/> <source file='/var/lib/libvirt/images/lmn.s1' index='2'/> <backingStore type='file' index='1'> <format type='qcow2'/> <source file='/var/lib/libvirt/images/lmn.qcow2'/> <backingStore/> </backingStore> <target dev='vda' bus='virtio'/> <alias name='virtio-disk0'/> <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/> </disk> 3. Create a disk image which has another backing file. # qemu-img create -f qcow2 /var/lib/libvirt/images/test.img 500M Formatting '/var/lib/libvirt/images/test.img', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=524288000 lazy_refcounts=off refcount_bits=16 # qemu-img create -f qcow2 -F qcow2 -b /var/lib/libvirt/images/test.img /tmp/copy.qcow2 Formatting '/tmp/copy.qcow2', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=524288000 backing_file=/var/lib/libvirt/images/test.img backing_fmt=qcow2 lazy_refcounts=off refcount_bits=16 4. Do blockcopy to /tmp/copy.qcow2 # virsh blockcopy lmn vda /tmp/copy.qcow2 --reuse-external --shallow --transient-job Block Copy started 5. Abort the blockjob. # virsh blockjob lmn vda --abort error: invalid argument: disk vda does not have an active block job 6. Check the libvirtd.log: ...... 2022-02-25 02:45:57.723+0000: 2401: debug : qemuMonitorJSONIOProcessLine:222 : Line [{"id": "libvirt-20", "error": {"class": "GenericError", "desc": "Failed to find node with node-name='libvirt-8-format'"}}] 2022-02-25 02:45:57.723+0000: 2401: info : qemuMonitorJSONIOProcessLine:241 : QEMU_MONITOR_RECV_REPLY: mon=0x7f302c082460 reply={"id": "libvirt-20", "error": {"class": "GenericError", "desc": "Failed to find node with node-name='libvirt-8-format'"}} 2022-02-25 02:45:57.723+0000: 2554: debug : qemuMonitorJSONCheckErrorFull:387 : unable to execute QEMU command {"execute":"blockdev-del","arguments":{"node-name":"libvirt-8-format"},"id":"libvirt-20"}: {"id":"libvirt-20","error":{"class":"GenericError","desc":"Failed to find node with node-name='libvirt-8-format'"}} 2022-02-25 02:45:57.723+0000: 2554: error : qemuMonitorJSONCheckErrorFull:399 : internal error: unable to execute QEMU command 'blockdev-del': Failed to find node with node-name='libvirt-8-format' ...... Pre-verified in libvirt-8.1.0-1.fc35.x86_64 and qemu-kvm-6.1.0-14.fc35.x86_64: PASSED Verified Version: libvirt-8.3.0-1.el9.x86_64 qemu-kvm-7.0.0-2.el9.x86_64 Verified Steps: S1:Do blockcopy to file disk with backing file 1. Prepare a running guest. # virsh domstate lmn running 2. Create snapshot for the guest. # virsh snapshot-create-as lmn s1 --disk-only Domain snapshot s1 created # virsh dumpxml lmn | xmllint --xpath //disk - <disk type='file' device='disk'> <driver name='qemu' type='qcow2'/> <source file='/var/lib/libvirt/images/lmn.s1' index='2'/> <backingStore type='file' index='1'> <format type='qcow2'/> <source file='/var/lib/libvirt/images/lmn.qcow2'/> <backingStore/> </backingStore> <target dev='vda' bus='virtio'/> <alias name='virtio-disk0'/> <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/> </disk> 3. Create a disk image which has another backing file. # qemu-img create -f qcow2 /var/lib/libvirt/images/test.img 10G Formatting '/var/lib/libvirt/images/test.img', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=10737418240 lazy_refcounts=off refcount_bits=16 # qemu-img create -f qcow2 -F qcow2 -b /var/lib/libvirt/images/test.img /tmp/copy.qcow2 Formatting '/tmp/copy.qcow2', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=10737418240 backing_file=/var/lib/libvirt/images/test.img backing_fmt=qcow2 lazy_refcounts=off refcount_bits=16 4. Do blockcopy and then abort the blockjob. # virsh blockcopy lmn vda /tmp/copy.qcow2 --reuse-external --shallow --transient-job Block Copy started # virsh blockjob lmn vda --abort # virsh dumpxml lmn | xmllint --xpath //disk - <disk type='file' device='disk'> <driver name='qemu' type='qcow2'/> <source file='/var/lib/libvirt/images/lmn.s1' index='2'/> <backingStore type='file' index='1'> <format type='qcow2'/> <source file='/var/lib/libvirt/images/lmn.qcow2'/> <backingStore/> </backingStore> <target dev='vda' bus='virtio'/> <alias name='virtio-disk0'/> <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/> </disk> 5. Do blockcopy and then pivot the blockjob. # virsh blockcopy lmn vda /tmp/copy.qcow2 --reuse-external --shallow --transient-job Block Copy started # virsh blockjob lmn vda --pivot # virsh dumpxml lmn | xmllint --xpath //disk - <disk type="file" device="disk"> <driver name="qemu" type="qcow2"/> <source file="/tmp/copy.qcow2" index="9"/> <backingStore type="file" index="10"> <format type="qcow2"/> <source file="/var/lib/libvirt/images/test.img"/> <backingStore/> </backingStore> <target dev="vda" bus="virtio"/> <alias name="virtio-disk0"/> <address type="pci" domain="0x0000" bus="0x04" slot="0x00" function="0x0"/> </disk> S2: Do blockcopy to block disk with backing file 1. Prepare a running guest. # virsh domstate lmn running 2. Create snapshot for the guest. # virsh snapshot-create-as lmn --no-metadata --reuse-external --disk-only --diskspec vdb,file=/dev/vg0/lv1,stype=block Domain snapshot 1652164878 created # virsh dumpxml lmn | xmllint --xpath //disk - <disk type="block" device="disk"> <driver name="qemu" type="qcow2" cache="none"/> <source dev="/dev/vg0/lv1" index="2"/> <backingStore type="block" index="1"> <format type="raw"/> <source dev="/dev/vg0/lv0"/> <backingStore/> </backingStore> <target dev="vdb" bus="virtio"/> <alias name="virtio-disk1"/> <address type="pci" domain="0x0000" bus="0x04" slot="0x00" function="0x0"/> </disk> 3. Create a block disk which has another backing file. # qemu-img create -f qcow2 -F qcow2 -b /dev/vg0/lv3 /dev/vg0/lv4 Formatting '/dev/vg0/lv4', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=104857600 backing_file=/dev/vg0/lv3 backing_fmt=qcow2 lazy_refcounts=off refcount_bits=16 4. Do blockcopy and then abort the blockjob. # virsh blockcopy lmn vdb /dev/vg0/lv4 --reuse-external --shallow --transient-job --blockdev Block Copy started # virsh blockjob lmn vdb --abort # virsh dumpxml lmn | xmllint --xpath //disk - <disk type="block" device="disk"> <driver name="qemu" type="qcow2" cache="none"/> <source dev="/dev/vg0/lv1" index="2"/> <backingStore type="block" index="1"> <format type="raw"/> <source dev="/dev/vg0/lv0"/> <backingStore/> </backingStore> <target dev="vdb" bus="virtio"/> <alias name="virtio-disk1"/> <address type="pci" domain="0x0000" bus="0x04" slot="0x00" function="0x0"/> </disk> 5. Do blockcopy and then pivot the blockjob. # virsh blockcopy lmn vdb /dev/vg0/lv4 --reuse-external --shallow --transient-job --blockdev Block Copy started # virsh blockjob lmn vdb --pivot # virsh dumpxml lmn | xmllint --xpath //disk - <disk type="block" device="disk"> <driver name="qemu" type="qcow2" cache="none"/> <source dev="/dev/vg0/lv4" index="5"/> <backingStore type="block" index="6"> <format type="qcow2"/> <source dev="/dev/vg0/lv3"/> <backingStore/> </backingStore> <target dev="vdb" bus="virtio"/> <alias name="virtio-disk1"/> <address type="pci" domain="0x0000" bus="0x04" slot="0x00" function="0x0"/> </disk> Both of them have no error libvirtd log. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Low: libvirt security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:8003 |