RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2057067 - `virsh blockjob --abort' logs error when cancelling a copy job started with '--reuse-external --shallow', where the target image has a backing file
Summary: `virsh blockjob --abort' logs error when cancelling a copy job started with '...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: libvirt
Version: 9.0
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: rc
: ---
Assignee: Peter Krempa
QA Contact: Meina Li
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-02-22 16:55 UTC by Kashyap Chamarthy
Modified: 2023-10-17 15:21 UTC (History)
9 users (show)

Fixed In Version: libvirt-8.1.0-1.el9
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-11-15 10:03:40 UTC
Type: Bug
Target Upstream Version: 8.1.0
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker LIBVIRTAT-13007 0 None None None 2022-07-08 08:11:53 UTC
Red Hat Issue Tracker RHELPLAN-113070 0 None None None 2022-07-08 08:07:55 UTC
Red Hat Product Errata RHSA-2022:8003 0 None None None 2022-11-15 10:04:28 UTC

Description Kashyap Chamarthy 2022-02-22 16:55:05 UTC
Description of problem
----------------------

[Thanks to Peter Krempa for the bug title summary.]

The test here is an OpenStack CI test.  And below is the rough
libvirt/QEMU sequence:

`virsh blockjob --abort' fails when cancelling a copy/mirror job that is
started with '--reuse-external --shallow'.  Where the target image has a
backing image.

And the failure is:

        internal error: unable to execute QEMU command 'blockdev-del':
        Failed to find node with node-name='libvirt-4-storage'

Where:

* "--reuse-external" == reuse an existing external file on the
  destination host for the mirror/copy job

* "--shallow" == the copy shares the backing chain

And the rough underlying QEMU call sequence here is:

    - blockdev-add,
    - blockdev-mirror,
    - block-job-cancel, 
    - job-dismiss, 
    - blockdev-del ... which fails with the above "internal error"


Root cause analysis
-------------------

This is based on an IRC chat with Peter:

   libvirt has a piece of code which ensures that thh backing image of
   the reused destination image is added only when finishing the job.
   On cancellation of the [copy] job, we want to unplug the image, but
   the backing image was not yet plugged in.
   
   However, since the test is doing a `block-job-cancel' here, which
   most likely still expects that the backing image was already plugged
   in.

Version
-------

  - libvirt version is 7.10.0;
  - QEMU is 6.1.0-5

How reproducible: Consistently (in the OpenStack CI)


Steps to Reproduce
------------------

The bug was triggered by OpenStack test code here:
https://bugs.launchpad.net/tripleo/+bug/1959014/ 

The test is roughly booting the server, then snapshot it, and try to
upload the image to Glance (the image template storage service)


Actual results
--------------

Copy job cancellation fails with:

    internal error: unable to execute QEMU command 'blockdev-del':
    Failed to find node with node-name='libvirt-4-storage' 


Expected results
----------------

The call to `blockdev-del` doesn't fail on [copy] job cancel.

Comment 1 Peter Krempa 2022-02-23 12:17:41 UTC
My original assumption was that the aborting of the block job actually propagates the error, but at the point where it happens we no longer propagate it to the caller, so the error is only a log entry.

The cancellation of the block job was actually successful, and the error is spurious because the image was not actually inserted. Thus it can be safely ignored until libvirt is fixed.

The actual problems described in the launchpad issue are actually caused by qemu crashing and have nothing to do with the block job cancellation reporting errors.

Comment 2 Peter Krempa 2022-02-23 12:25:58 UTC
To reproduce the issue the following steps are necessary:

1) create a VM with a disk image which has at least one backing image, or create a snapshot. E.g.:

    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/tmp/img.qcow2' index='1'/>
      <backingStore type='file' index='5'>
        <format type='qcow2'/>
        <source file='/tmp/copybase.qcow2'/>
        <backingStore/>
      </backingStore>
      <target dev='hdd' bus='ide'/>
      <alias name='ide0-1-1'/>
      <address type='drive' controller='0' bus='1' target='0' unit='1'/>
    </disk>

2) create the destination images:

cp /tmp/copybase.qcow2 /tmp/copycopy.qcow2
qemu-img create -f qcow2 -F qcow2 -b /tmp/copycopy.qcow2 /tmp/copy.qcow2

(no need to actually copy the original image, you can create a dummy one, the data will not be consistent, but we are going to cancel the job anyways)

3) start the copy job
virsh blockcopy $VM --path $DISKTARGET --dest /tmp/copy.qcow2 --reuse-external --shallow --transient-job

4) abor the blockjob
virsh blockjob --abort $VM $DISKTARGET

The log file will have the error mentioned in the description.

Comment 3 Peter Krempa 2022-02-23 12:26:45 UTC
Fixed upstream:

commit 14851cff117a5cb77f0543f0ca5b72d10b83b8e5
Author: Peter Krempa <pkrempa>
Date:   Tue Feb 22 17:34:46 2022 +0100

    qemu: blockjob: Avoid spurious log errors when cancelling a shallow copy with reused images
    
    In case when a user starts a block copy operation with
    VIR_DOMAIN_BLOCK_COPY_SHALLOW and VIR_DOMAIN_BLOCK_COPY_REUSE_EXT and
    both the reused image and the original disk have a backing image libvirt
    specifically does not insert the backing image until after the job is
    asked to be completed via virBlockJobAbort with
    VIR_DOMAIN_BLOCK_JOB_ABORT_PIVOT.
    
    This is so that management applications can copy the backing image on
    the background.
    
    Now when a user aborts the block job instead of cancelling it we'd
    ignore the fact that we didn't insert the backing image yet and the
    cancellation would result into a 'blockdev-del' of a invalid node name
    and thus an 'error' severity entry in the log.
    
    To solve this issue we use the same conditions when the backing image
    addition is avoided to remove the internal state for them prior to the
    call to unplug the mirror destination.
    
    Reported-by: Kashyap Chamarthy <kchamart>
    Signed-off-by: Peter Krempa <pkrempa>
    Reviewed-by: Ján Tomko <jtomko>

v8.0.0-469-g14851cff11

Comment 4 Meina Li 2022-02-25 03:27:55 UTC
Reprocuded version:
libvirt-8.0.0-5.el9.x86_64
qemu-kvm-6.2.0-10.el9.x86_64

Reproduced Steps:
1. Prepare a running guest.
# virsh domstate lmn
running
2. Create snapshot for the guest.
# virsh snapshot-create-as lmn s1 --disk-only
Domain snapshot s1 created
# virsh dumpxml lmn | grep /disk -B10
......
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/var/lib/libvirt/images/lmn.s1' index='2'/>
      <backingStore type='file' index='1'>
        <format type='qcow2'/>
        <source file='/var/lib/libvirt/images/lmn.qcow2'/>
        <backingStore/>
      </backingStore>
      <target dev='vda' bus='virtio'/>
      <alias name='virtio-disk0'/>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
    </disk>
3. Create a disk image which has another backing file.
# qemu-img create -f qcow2 /var/lib/libvirt/images/test.img 500M
Formatting '/var/lib/libvirt/images/test.img', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=524288000 lazy_refcounts=off refcount_bits=16
# qemu-img create -f qcow2 -F qcow2 -b /var/lib/libvirt/images/test.img /tmp/copy.qcow2
Formatting '/tmp/copy.qcow2', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=524288000 backing_file=/var/lib/libvirt/images/test.img backing_fmt=qcow2 lazy_refcounts=off refcount_bits=16
4. Do blockcopy to /tmp/copy.qcow2
# virsh blockcopy lmn vda /tmp/copy.qcow2 --reuse-external --shallow --transient-job
Block Copy started
5. Abort the blockjob.
# virsh blockjob lmn vda --abort
error: invalid argument: disk vda does not have an active block job
6. Check the libvirtd.log:
......
2022-02-25 02:45:57.723+0000: 2401: debug : qemuMonitorJSONIOProcessLine:222 : Line [{"id": "libvirt-20", "error": {"class": "GenericError", "desc": "Failed to find node with node-name='libvirt-8-format'"}}]
2022-02-25 02:45:57.723+0000: 2401: info : qemuMonitorJSONIOProcessLine:241 : QEMU_MONITOR_RECV_REPLY: mon=0x7f302c082460 reply={"id": "libvirt-20", "error": {"class": "GenericError", "desc": "Failed to find node with node-name='libvirt-8-format'"}}
2022-02-25 02:45:57.723+0000: 2554: debug : qemuMonitorJSONCheckErrorFull:387 : unable to execute QEMU command {"execute":"blockdev-del","arguments":{"node-name":"libvirt-8-format"},"id":"libvirt-20"}: {"id":"libvirt-20","error":{"class":"GenericError","desc":"Failed to find node with node-name='libvirt-8-format'"}}
2022-02-25 02:45:57.723+0000: 2554: error : qemuMonitorJSONCheckErrorFull:399 : internal error: unable to execute QEMU command 'blockdev-del': Failed to find node with node-name='libvirt-8-format'
......

Comment 5 Meina Li 2022-02-25 06:43:35 UTC
Pre-verified in libvirt-8.1.0-1.fc35.x86_64 and qemu-kvm-6.1.0-14.fc35.x86_64: PASSED

Comment 8 Meina Li 2022-05-10 06:55:09 UTC
Verified Version:
libvirt-8.3.0-1.el9.x86_64
qemu-kvm-7.0.0-2.el9.x86_64

Verified Steps:
S1:Do blockcopy to file disk with backing file
1. Prepare a running guest.
# virsh domstate lmn
running
2. Create snapshot for the guest.
# virsh snapshot-create-as lmn s1 --disk-only
Domain snapshot s1 created
# virsh dumpxml lmn | xmllint --xpath //disk -
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/var/lib/libvirt/images/lmn.s1' index='2'/>
      <backingStore type='file' index='1'>
        <format type='qcow2'/>
        <source file='/var/lib/libvirt/images/lmn.qcow2'/>
        <backingStore/>
      </backingStore>
      <target dev='vda' bus='virtio'/>
      <alias name='virtio-disk0'/>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
    </disk>
3. Create a disk image which has another backing file.
# qemu-img create -f qcow2 /var/lib/libvirt/images/test.img 10G
Formatting '/var/lib/libvirt/images/test.img', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=10737418240 lazy_refcounts=off refcount_bits=16
# qemu-img create -f qcow2 -F qcow2 -b /var/lib/libvirt/images/test.img /tmp/copy.qcow2
Formatting '/tmp/copy.qcow2', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=10737418240 backing_file=/var/lib/libvirt/images/test.img backing_fmt=qcow2 lazy_refcounts=off refcount_bits=16
4. Do blockcopy and then abort the blockjob.
# virsh blockcopy lmn vda /tmp/copy.qcow2 --reuse-external --shallow --transient-job
Block Copy started
# virsh blockjob lmn vda --abort
# virsh dumpxml lmn | xmllint --xpath //disk -
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/var/lib/libvirt/images/lmn.s1' index='2'/>
      <backingStore type='file' index='1'>
        <format type='qcow2'/>
        <source file='/var/lib/libvirt/images/lmn.qcow2'/>
        <backingStore/>
      </backingStore>
      <target dev='vda' bus='virtio'/>
      <alias name='virtio-disk0'/>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
    </disk>
5. Do blockcopy and then pivot the blockjob.
# virsh blockcopy lmn vda /tmp/copy.qcow2 --reuse-external --shallow --transient-job
Block Copy started
# virsh blockjob lmn vda --pivot
# virsh dumpxml lmn | xmllint --xpath //disk -
<disk type="file" device="disk">
      <driver name="qemu" type="qcow2"/>
      <source file="/tmp/copy.qcow2" index="9"/>
      <backingStore type="file" index="10">
        <format type="qcow2"/>
        <source file="/var/lib/libvirt/images/test.img"/>
        <backingStore/>
      </backingStore>
      <target dev="vda" bus="virtio"/>
      <alias name="virtio-disk0"/>
      <address type="pci" domain="0x0000" bus="0x04" slot="0x00" function="0x0"/>
    </disk>

S2: Do blockcopy to block disk with backing file
1. Prepare a running guest.
# virsh domstate lmn
running
2. Create snapshot for the guest.
# virsh snapshot-create-as lmn --no-metadata --reuse-external --disk-only --diskspec vdb,file=/dev/vg0/lv1,stype=block
Domain snapshot 1652164878 created
# virsh dumpxml lmn | xmllint --xpath //disk -
<disk type="block" device="disk">
      <driver name="qemu" type="qcow2" cache="none"/>
      <source dev="/dev/vg0/lv1" index="2"/>
      <backingStore type="block" index="1">
        <format type="raw"/>
        <source dev="/dev/vg0/lv0"/>
        <backingStore/>
      </backingStore>
      <target dev="vdb" bus="virtio"/>
      <alias name="virtio-disk1"/>
      <address type="pci" domain="0x0000" bus="0x04" slot="0x00" function="0x0"/>
    </disk>
3. Create a block disk which has another backing file.
# qemu-img create -f qcow2 -F qcow2 -b /dev/vg0/lv3  /dev/vg0/lv4
Formatting '/dev/vg0/lv4', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=104857600 backing_file=/dev/vg0/lv3 backing_fmt=qcow2 lazy_refcounts=off refcount_bits=16
4. Do blockcopy and then abort the blockjob.
# virsh blockcopy lmn vdb /dev/vg0/lv4 --reuse-external --shallow --transient-job --blockdev
Block Copy started
# virsh blockjob lmn vdb --abort
# virsh dumpxml lmn | xmllint --xpath //disk -
<disk type="block" device="disk">
      <driver name="qemu" type="qcow2" cache="none"/>
      <source dev="/dev/vg0/lv1" index="2"/>
      <backingStore type="block" index="1">
        <format type="raw"/>
        <source dev="/dev/vg0/lv0"/>
        <backingStore/>
      </backingStore>
      <target dev="vdb" bus="virtio"/>
      <alias name="virtio-disk1"/>
      <address type="pci" domain="0x0000" bus="0x04" slot="0x00" function="0x0"/>
    </disk>
5. Do blockcopy and then pivot the blockjob.
# virsh blockcopy lmn vdb /dev/vg0/lv4 --reuse-external --shallow --transient-job --blockdev
Block Copy started
# virsh blockjob lmn vdb --pivot
# virsh dumpxml lmn | xmllint --xpath //disk -
<disk type="block" device="disk">
      <driver name="qemu" type="qcow2" cache="none"/>
      <source dev="/dev/vg0/lv4" index="5"/>
      <backingStore type="block" index="6">
        <format type="qcow2"/>
        <source dev="/dev/vg0/lv3"/>
        <backingStore/>
      </backingStore>
      <target dev="vdb" bus="virtio"/>
      <alias name="virtio-disk1"/>
      <address type="pci" domain="0x0000" bus="0x04" slot="0x00" function="0x0"/>
    </disk>

Both of them have no error libvirtd log.

Comment 10 errata-xmlrpc 2022-11-15 10:03:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Low: libvirt security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:8003


Note You need to log in before you can comment on or make changes to this bug.