Bug 2057067
| Summary: | `virsh blockjob --abort' logs error when cancelling a copy job started with '--reuse-external --shallow', where the target image has a backing file | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 9 | Reporter: | Kashyap Chamarthy <kchamart> |
| Component: | libvirt | Assignee: | Peter Krempa <pkrempa> |
| libvirt sub component: | Storage | QA Contact: | Meina Li <meili> |
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | low | ||
| Priority: | low | CC: | astupnik, chhu, dzheng, jdenemar, lmen, nanli, pkrempa, virt-maint, xuzhang |
| Version: | 9.0 | Keywords: | Triaged |
| Target Milestone: | rc | Flags: | pm-rhel:
mirror+
|
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | libvirt-8.1.0-1.el9 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-11-15 10:03:40 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | 8.1.0 |
| Embargoed: | |||
My original assumption was that the aborting of the block job actually propagates the error, but at the point where it happens we no longer propagate it to the caller, so the error is only a log entry. The cancellation of the block job was actually successful, and the error is spurious because the image was not actually inserted. Thus it can be safely ignored until libvirt is fixed. The actual problems described in the launchpad issue are actually caused by qemu crashing and have nothing to do with the block job cancellation reporting errors. To reproduce the issue the following steps are necessary:
1) create a VM with a disk image which has at least one backing image, or create a snapshot. E.g.:
<disk type='file' device='disk'>
<driver name='qemu' type='qcow2'/>
<source file='/tmp/img.qcow2' index='1'/>
<backingStore type='file' index='5'>
<format type='qcow2'/>
<source file='/tmp/copybase.qcow2'/>
<backingStore/>
</backingStore>
<target dev='hdd' bus='ide'/>
<alias name='ide0-1-1'/>
<address type='drive' controller='0' bus='1' target='0' unit='1'/>
</disk>
2) create the destination images:
cp /tmp/copybase.qcow2 /tmp/copycopy.qcow2
qemu-img create -f qcow2 -F qcow2 -b /tmp/copycopy.qcow2 /tmp/copy.qcow2
(no need to actually copy the original image, you can create a dummy one, the data will not be consistent, but we are going to cancel the job anyways)
3) start the copy job
virsh blockcopy $VM --path $DISKTARGET --dest /tmp/copy.qcow2 --reuse-external --shallow --transient-job
4) abor the blockjob
virsh blockjob --abort $VM $DISKTARGET
The log file will have the error mentioned in the description.
Fixed upstream:
commit 14851cff117a5cb77f0543f0ca5b72d10b83b8e5
Author: Peter Krempa <pkrempa>
Date: Tue Feb 22 17:34:46 2022 +0100
qemu: blockjob: Avoid spurious log errors when cancelling a shallow copy with reused images
In case when a user starts a block copy operation with
VIR_DOMAIN_BLOCK_COPY_SHALLOW and VIR_DOMAIN_BLOCK_COPY_REUSE_EXT and
both the reused image and the original disk have a backing image libvirt
specifically does not insert the backing image until after the job is
asked to be completed via virBlockJobAbort with
VIR_DOMAIN_BLOCK_JOB_ABORT_PIVOT.
This is so that management applications can copy the backing image on
the background.
Now when a user aborts the block job instead of cancelling it we'd
ignore the fact that we didn't insert the backing image yet and the
cancellation would result into a 'blockdev-del' of a invalid node name
and thus an 'error' severity entry in the log.
To solve this issue we use the same conditions when the backing image
addition is avoided to remove the internal state for them prior to the
call to unplug the mirror destination.
Reported-by: Kashyap Chamarthy <kchamart>
Signed-off-by: Peter Krempa <pkrempa>
Reviewed-by: Ján Tomko <jtomko>
v8.0.0-469-g14851cff11
Reprocuded version:
libvirt-8.0.0-5.el9.x86_64
qemu-kvm-6.2.0-10.el9.x86_64
Reproduced Steps:
1. Prepare a running guest.
# virsh domstate lmn
running
2. Create snapshot for the guest.
# virsh snapshot-create-as lmn s1 --disk-only
Domain snapshot s1 created
# virsh dumpxml lmn | grep /disk -B10
......
<disk type='file' device='disk'>
<driver name='qemu' type='qcow2'/>
<source file='/var/lib/libvirt/images/lmn.s1' index='2'/>
<backingStore type='file' index='1'>
<format type='qcow2'/>
<source file='/var/lib/libvirt/images/lmn.qcow2'/>
<backingStore/>
</backingStore>
<target dev='vda' bus='virtio'/>
<alias name='virtio-disk0'/>
<address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
</disk>
3. Create a disk image which has another backing file.
# qemu-img create -f qcow2 /var/lib/libvirt/images/test.img 500M
Formatting '/var/lib/libvirt/images/test.img', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=524288000 lazy_refcounts=off refcount_bits=16
# qemu-img create -f qcow2 -F qcow2 -b /var/lib/libvirt/images/test.img /tmp/copy.qcow2
Formatting '/tmp/copy.qcow2', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=524288000 backing_file=/var/lib/libvirt/images/test.img backing_fmt=qcow2 lazy_refcounts=off refcount_bits=16
4. Do blockcopy to /tmp/copy.qcow2
# virsh blockcopy lmn vda /tmp/copy.qcow2 --reuse-external --shallow --transient-job
Block Copy started
5. Abort the blockjob.
# virsh blockjob lmn vda --abort
error: invalid argument: disk vda does not have an active block job
6. Check the libvirtd.log:
......
2022-02-25 02:45:57.723+0000: 2401: debug : qemuMonitorJSONIOProcessLine:222 : Line [{"id": "libvirt-20", "error": {"class": "GenericError", "desc": "Failed to find node with node-name='libvirt-8-format'"}}]
2022-02-25 02:45:57.723+0000: 2401: info : qemuMonitorJSONIOProcessLine:241 : QEMU_MONITOR_RECV_REPLY: mon=0x7f302c082460 reply={"id": "libvirt-20", "error": {"class": "GenericError", "desc": "Failed to find node with node-name='libvirt-8-format'"}}
2022-02-25 02:45:57.723+0000: 2554: debug : qemuMonitorJSONCheckErrorFull:387 : unable to execute QEMU command {"execute":"blockdev-del","arguments":{"node-name":"libvirt-8-format"},"id":"libvirt-20"}: {"id":"libvirt-20","error":{"class":"GenericError","desc":"Failed to find node with node-name='libvirt-8-format'"}}
2022-02-25 02:45:57.723+0000: 2554: error : qemuMonitorJSONCheckErrorFull:399 : internal error: unable to execute QEMU command 'blockdev-del': Failed to find node with node-name='libvirt-8-format'
......
Pre-verified in libvirt-8.1.0-1.fc35.x86_64 and qemu-kvm-6.1.0-14.fc35.x86_64: PASSED Verified Version:
libvirt-8.3.0-1.el9.x86_64
qemu-kvm-7.0.0-2.el9.x86_64
Verified Steps:
S1:Do blockcopy to file disk with backing file
1. Prepare a running guest.
# virsh domstate lmn
running
2. Create snapshot for the guest.
# virsh snapshot-create-as lmn s1 --disk-only
Domain snapshot s1 created
# virsh dumpxml lmn | xmllint --xpath //disk -
<disk type='file' device='disk'>
<driver name='qemu' type='qcow2'/>
<source file='/var/lib/libvirt/images/lmn.s1' index='2'/>
<backingStore type='file' index='1'>
<format type='qcow2'/>
<source file='/var/lib/libvirt/images/lmn.qcow2'/>
<backingStore/>
</backingStore>
<target dev='vda' bus='virtio'/>
<alias name='virtio-disk0'/>
<address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
</disk>
3. Create a disk image which has another backing file.
# qemu-img create -f qcow2 /var/lib/libvirt/images/test.img 10G
Formatting '/var/lib/libvirt/images/test.img', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=10737418240 lazy_refcounts=off refcount_bits=16
# qemu-img create -f qcow2 -F qcow2 -b /var/lib/libvirt/images/test.img /tmp/copy.qcow2
Formatting '/tmp/copy.qcow2', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=10737418240 backing_file=/var/lib/libvirt/images/test.img backing_fmt=qcow2 lazy_refcounts=off refcount_bits=16
4. Do blockcopy and then abort the blockjob.
# virsh blockcopy lmn vda /tmp/copy.qcow2 --reuse-external --shallow --transient-job
Block Copy started
# virsh blockjob lmn vda --abort
# virsh dumpxml lmn | xmllint --xpath //disk -
<disk type='file' device='disk'>
<driver name='qemu' type='qcow2'/>
<source file='/var/lib/libvirt/images/lmn.s1' index='2'/>
<backingStore type='file' index='1'>
<format type='qcow2'/>
<source file='/var/lib/libvirt/images/lmn.qcow2'/>
<backingStore/>
</backingStore>
<target dev='vda' bus='virtio'/>
<alias name='virtio-disk0'/>
<address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
</disk>
5. Do blockcopy and then pivot the blockjob.
# virsh blockcopy lmn vda /tmp/copy.qcow2 --reuse-external --shallow --transient-job
Block Copy started
# virsh blockjob lmn vda --pivot
# virsh dumpxml lmn | xmllint --xpath //disk -
<disk type="file" device="disk">
<driver name="qemu" type="qcow2"/>
<source file="/tmp/copy.qcow2" index="9"/>
<backingStore type="file" index="10">
<format type="qcow2"/>
<source file="/var/lib/libvirt/images/test.img"/>
<backingStore/>
</backingStore>
<target dev="vda" bus="virtio"/>
<alias name="virtio-disk0"/>
<address type="pci" domain="0x0000" bus="0x04" slot="0x00" function="0x0"/>
</disk>
S2: Do blockcopy to block disk with backing file
1. Prepare a running guest.
# virsh domstate lmn
running
2. Create snapshot for the guest.
# virsh snapshot-create-as lmn --no-metadata --reuse-external --disk-only --diskspec vdb,file=/dev/vg0/lv1,stype=block
Domain snapshot 1652164878 created
# virsh dumpxml lmn | xmllint --xpath //disk -
<disk type="block" device="disk">
<driver name="qemu" type="qcow2" cache="none"/>
<source dev="/dev/vg0/lv1" index="2"/>
<backingStore type="block" index="1">
<format type="raw"/>
<source dev="/dev/vg0/lv0"/>
<backingStore/>
</backingStore>
<target dev="vdb" bus="virtio"/>
<alias name="virtio-disk1"/>
<address type="pci" domain="0x0000" bus="0x04" slot="0x00" function="0x0"/>
</disk>
3. Create a block disk which has another backing file.
# qemu-img create -f qcow2 -F qcow2 -b /dev/vg0/lv3 /dev/vg0/lv4
Formatting '/dev/vg0/lv4', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=104857600 backing_file=/dev/vg0/lv3 backing_fmt=qcow2 lazy_refcounts=off refcount_bits=16
4. Do blockcopy and then abort the blockjob.
# virsh blockcopy lmn vdb /dev/vg0/lv4 --reuse-external --shallow --transient-job --blockdev
Block Copy started
# virsh blockjob lmn vdb --abort
# virsh dumpxml lmn | xmllint --xpath //disk -
<disk type="block" device="disk">
<driver name="qemu" type="qcow2" cache="none"/>
<source dev="/dev/vg0/lv1" index="2"/>
<backingStore type="block" index="1">
<format type="raw"/>
<source dev="/dev/vg0/lv0"/>
<backingStore/>
</backingStore>
<target dev="vdb" bus="virtio"/>
<alias name="virtio-disk1"/>
<address type="pci" domain="0x0000" bus="0x04" slot="0x00" function="0x0"/>
</disk>
5. Do blockcopy and then pivot the blockjob.
# virsh blockcopy lmn vdb /dev/vg0/lv4 --reuse-external --shallow --transient-job --blockdev
Block Copy started
# virsh blockjob lmn vdb --pivot
# virsh dumpxml lmn | xmllint --xpath //disk -
<disk type="block" device="disk">
<driver name="qemu" type="qcow2" cache="none"/>
<source dev="/dev/vg0/lv4" index="5"/>
<backingStore type="block" index="6">
<format type="qcow2"/>
<source dev="/dev/vg0/lv3"/>
<backingStore/>
</backingStore>
<target dev="vdb" bus="virtio"/>
<alias name="virtio-disk1"/>
<address type="pci" domain="0x0000" bus="0x04" slot="0x00" function="0x0"/>
</disk>
Both of them have no error libvirtd log.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Low: libvirt security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:8003 |
Description of problem ---------------------- [Thanks to Peter Krempa for the bug title summary.] The test here is an OpenStack CI test. And below is the rough libvirt/QEMU sequence: `virsh blockjob --abort' fails when cancelling a copy/mirror job that is started with '--reuse-external --shallow'. Where the target image has a backing image. And the failure is: internal error: unable to execute QEMU command 'blockdev-del': Failed to find node with node-name='libvirt-4-storage' Where: * "--reuse-external" == reuse an existing external file on the destination host for the mirror/copy job * "--shallow" == the copy shares the backing chain And the rough underlying QEMU call sequence here is: - blockdev-add, - blockdev-mirror, - block-job-cancel, - job-dismiss, - blockdev-del ... which fails with the above "internal error" Root cause analysis ------------------- This is based on an IRC chat with Peter: libvirt has a piece of code which ensures that thh backing image of the reused destination image is added only when finishing the job. On cancellation of the [copy] job, we want to unplug the image, but the backing image was not yet plugged in. However, since the test is doing a `block-job-cancel' here, which most likely still expects that the backing image was already plugged in. Version ------- - libvirt version is 7.10.0; - QEMU is 6.1.0-5 How reproducible: Consistently (in the OpenStack CI) Steps to Reproduce ------------------ The bug was triggered by OpenStack test code here: https://bugs.launchpad.net/tripleo/+bug/1959014/ The test is roughly booting the server, then snapshot it, and try to upload the image to Glance (the image template storage service) Actual results -------------- Copy job cancellation fails with: internal error: unable to execute QEMU command 'blockdev-del': Failed to find node with node-name='libvirt-4-storage' Expected results ---------------- The call to `blockdev-del` doesn't fail on [copy] job cancel.