RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1946918 - Undefine guest with --snapshots-metadata failed after blockpull
Summary: Undefine guest with --snapshots-metadata failed after blockpull
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: libvirt
Version: 9.0
Hardware: Unspecified
OS: Unspecified
medium
unspecified
Target Milestone: beta
: ---
Assignee: Peter Krempa
QA Contact: Meina Li
URL:
Whiteboard:
Depends On:
Blocks: 1948619
TreeView+ depends on / blocked
 
Reported: 2021-04-07 08:59 UTC by Meina Li
Modified: 2021-12-07 22:02 UTC (History)
4 users (show)

Fixed In Version: libvirt-7.3.0-1.el9
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1948619 (view as bug list)
Environment:
Last Closed: 2021-12-07 21:57:54 UTC
Type: Bug
Target Upstream Version: 7.3.0
Embargoed:


Attachments (Terms of Use)
bt.log (12.38 KB, text/plain)
2021-04-07 08:59 UTC, Meina Li
no flags Details
libvirtd.log (1.48 MB, text/plain)
2021-04-08 03:43 UTC, Meina Li
no flags Details
coredump.log (15.40 KB, text/plain)
2021-04-09 07:37 UTC, Meina Li
no flags Details

Description Meina Li 2021-04-07 08:59:19 UTC
Created attachment 1769793 [details]
bt.log

Description of problem:
Undefine guest with --snapshots-metadata failed after blockpull

Version-Release number of selected component (if applicable):
libvirt-7.0.0-4.el9.x86_64
qemu-kvm-5.2.0-11.el9.x86_64
kernel-5.11.0-2.el9.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Prepare a running guest.
# virsh domstate avocado-vt-vm1 
running

2. Create snapshots for the guest.
# for i in {1..4}; do virsh snapshot-create-as avocado-vt-vm1 s$i --disk-only; done
Domain snapshot s1 created
Domain snapshot s2 created
Domain snapshot s3 created
Domain snapshot s4 created

3. Do blockpull for the virtual disk.
# virsh blockpull avocado-vt-vm1 vda --wait --verbose --async
Block Pull: [100 %]
Pull complete

4. Destroy the guest and undefine it with --snapshots-metadata.
# virsh destroy avocado-vt-vm1 
Domain 'avocado-vt-vm1' destroyed
# virsh undefine avocado-vt-vm1 --snapshots-metadata
error: Disconnected from qemu:///system due to end of file
error: Failed to undefine domain 'avocado-vt-vm1'
error: End of file while reading data: Input/output error

Actual results:
Undefine the guest failed 

Expected results:
Undefine the guest successfully

Additional info:

Comment 1 Peter Krempa 2021-04-07 12:20:08 UTC
I didn't manage to reproduce the problem. Could you please also attach debug logs of libvirtd when the issue reproduces.

Additionally the backtrace which is attached is from a idle libvirtd not from any crashed one, so it doesn't help at all. If libvirtd really crashed the backtrace needs to be collected from the crashed process.

Comment 2 Meina Li 2021-04-08 03:42:36 UTC
(In reply to Peter Krempa from comment #1)
> I didn't manage to reproduce the problem. Could you please also attach debug
> logs of libvirtd when the issue reproduces.
> 
> Additionally the backtrace which is attached is from a idle libvirtd not
> from any crashed one, so it doesn't help at all. If libvirtd really crashed
> the backtrace needs to be collected from the crashed process.

I found the guest is undefined successfully even though it has the "Input/output error". And the libvirtd maybe doesn't crash after checking the backtrace.
Please check the libvirtd log. 
Thank you.

Comment 3 Meina Li 2021-04-08 03:43:22 UTC
Created attachment 1770110 [details]
libvirtd.log

Comment 4 Peter Krempa 2021-04-09 07:11:16 UTC
Could you please try getting the coredump backtrace from the actual crash e.g. via "coredumpctl dump /usr/sbin/libvirtd"

Comment 5 Meina Li 2021-04-09 07:37:30 UTC
Created attachment 1770504 [details]
coredump.log

Comment 6 Peter Krempa 2021-04-09 07:43:04 UTC
Thank you! The important part is:
          
                Stack trace of thread 1099086:
                #0  0x00007fb5d40da7bd virObjectUnref (libvirt.so.0 + 0x10a7bd)
                #1  0x00007fb5d40f24c1 virStorageSourceBackingStoreClear (libvirt.so.0 + 0x1224c1)
                #2  0x00007fb5d40fb7d3 virStorageSourceClear (libvirt.so.0 + 0x12b7d3)
                #3  0x00007fb5d40da76e vir_object_finalize (libvirt.so.0 + 0x10a76e)
                #4  0x00007fb5d3e54e1c g_object_unref (libgobject-2.0.so.0 + 0x20e1c)
                #5  0x00007fb5d40da7d8 virObjectUnref (libvirt.so.0 + 0x10a7d8)
                #6  0x00007fb5d4123bc9 virDomainDiskDefFree (libvirt.so.0 + 0x153bc9)
                #7  0x00007fb5d4124ebc virDomainDefFree (libvirt.so.0 + 0x154ebc)
                #8  0x00007fb5d4125b31 virDomainObjDispose (libvirt.so.0 + 0x155b31)
                #9  0x00007fb5d40da76e vir_object_finalize (libvirt.so.0 + 0x10a76e)
                #10 0x00007fb5d3e54e1c g_object_unref (libgobject-2.0.so.0 + 0x20e1c)
                #11 0x00007fb5d40da7d8 virObjectUnref (libvirt.so.0 + 0x10a7d8)
                #12 0x00007fb5d4125e8d virDomainObjEndAPI (libvirt.so.0 + 0x155e8d)
                #13 0x00007fb5c8760f8a qemuDomainUndefineFlags (libvirt_driver_qemu.so + 0xcaf8a)
                #14 0x00007fb5d42a61e9 virDomainUndefineFlags (libvirt.so.0 + 0x2d61e9)
                #15 0x0000559b1807ed08 remoteDispatchDomainUndefineFlagsHelper.lto_priv.0 (libvirtd + 0x46d08)
                #16 0x00007fb5d41bdab6 virNetServerProgramDispatch (libvirt.so.0 + 0x1edab6)
                #17 0x00007fb5d41c3b28 virNetServerHandleJob (libvirt.so.0 + 0x1f3b28)
                #18 0x00007fb5d40fbcf2 virThreadPoolWorker (libvirt.so.0 + 0x12bcf2)
                #19 0x00007fb5d4102e29 virThreadHelper (libvirt.so.0 + 0x132e29)
                #20 0x00007fb5d336d299 start_thread (libpthread.so.0 + 0x9299)
                #21 0x00007fb5d3b49863 __clone (libc.so.6 + 0x100863)

Comment 7 Peter Krempa 2021-04-09 08:08:26 UTC
Did your VM config have any <backingStore> for disks defined explicitly prior to the startup? Ideally post the full VM xml prior to the startup of the VM.

Comment 8 Meina Li 2021-04-09 09:11:26 UTC
(In reply to Peter Krempa from comment #7)
> Did your VM config have any <backingStore> for disks defined explicitly
> prior to the startup? Ideally post the full VM xml prior to the startup of
> the VM.

Yes, it includes <backingStore/> in xml before startup. But this scenario can be passed in RHEL-AV 8.4.
And it can undefine without error after I removed the element. 

   <disk type='file' device='disk'>virStorageSourceClear
      <driver name='qemu' type='qcow2' cache='none'/>
      <source file='/var/lib/avocado/data/avocado-vt/images/jeos-27-x86_64.qcow2'/>
      <backingStore/>
      <target dev='vda' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x08' slot='0x00' function='0x0'/>
    </disk>

Except this, it also hit this error for iscsi network disk even though it doesn't include <backingStore/> in XML:

    <disk type='network' device='disk'>
      <driver name='qemu' type='raw' cache='none'/>
      <source protocol='iscsi' name='iqn.2021-04.com.virttest:emulated-iscsi.target/0'>
        <host name='127.0.0.1' port='3260'/>
      </source>
      <target dev='vda' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x08' slot='0x00' function='0x0'/>
    </disk>

Similar stack trace for iscsi network disk:
Stack trace of thread 1103759:
#0  0x00007f1eb5b157bd virObjectUnref (libvirt.so.0 + 0x10a7bd)
#1  0x00007f1eb5b2d4c1 virStorageSourceBackingStoreClear (libvirt.so.0 + 0x1224c1)
#2  0x00007f1eb5b367d3 virStorageSourceClear (libvirt.so.0 + 0x12b7d3)
#3  0x00007f1eb5b1576e vir_object_finalize (libvirt.so.0 + 0x10a76e)
#4  0x00007f1eb588fe1c g_object_unref (libgobject-2.0.so.0 + 0x20e1c)
#5  0x00007f1eb5b157d8 virObjectUnref (libvirt.so.0 + 0x10a7d8)
#6  0x00007f1eb5b5ebc9 virDomainDiskDefFree (libvirt.so.0 + 0x153bc9)
#7  0x00007f1eb5b5febc virDomainDefFree (libvirt.so.0 + 0x154ebc)
#8  0x00007f1eb5b60b31 virDomainObjDispose (libvirt.so.0 + 0x155b31)
#9  0x00007f1eb5b1576e vir_object_finalize (libvirt.so.0 + 0x10a76e)
#10 0x00007f1eb588fe1c g_object_unref (libgobject-2.0.so.0 + 0x20e1c)
#11 0x00007f1eb5b157d8 virObjectUnref (libvirt.so.0 + 0x10a7d8)
#12 0x00007f1eb5b60e8d virDomainObjEndAPI (libvirt.so.0 + 0x155e8d)
#13 0x00007f1ea9980f8a qemuDomainUndefineFlags (libvirt_driver_qemu.so + 0xcaf8a)
#14 0x00007f1eb5ce11e9 virDomainUndefineFlags (libvirt.so.0 + 0x2d61e9)
#15 0x000055b06bdbcd08 remoteDispatchDomainUndefineFlagsHelper.lto_priv.0 (libvirtd + 0x46d08)
#16 0x00007f1eb5bf8ab6 virNetServerProgramDispatch (libvirt.so.0 + 0x1edab6)
#17 0x00007f1eb5bfeb28 virNetServerHandleJob (libvirt.so.0 + 0x1f3b28)
#18 0x00007f1eb5b36cf2 virThreadPoolWorker (libvirt.so.0 + 0x12bcf2)
#19 0x00007f1eb5b3de29 virThreadHelper (libvirt.so.0 + 0x132e29)
#20 0x00007f1eb4da8299 start_thread (libpthread.so.0 + 0x9299)
#21 0x00007f1eb5584863 __clone (libc.so.6 + 0x100863)

Comment 9 Peter Krempa 2021-04-09 10:28:53 UTC
Yes, that will be regardless of disk type. The handling differs depending on the presence of <backingStore/>.

Thanks for the info.

Comment 10 Peter Krempa 2021-04-12 15:50:50 UTC
Note that the bug may not always manifest itself in a crash, but it's easily observable when running libvirtd under valgrind.

Already after the 'virsh blockpull' step an invalid read is reported:

==2118357== Invalid read of size 8
==2118357==    at 0x496E121: VIR_IS_OBJECT (virobject.h:44)
==2118357==    by 0x496E121: virObjectUnref (virobject.c:377)
==2118357==    by 0x4A33ABC: virStorageSourceBackingStoreClear (storage_source_conf.c:1105)
==2118357==    by 0x4A34515: virStorageSourceClear (storage_source_conf.c:1142)
==2118357==    by 0x496DAED: vir_object_finalize (virobject.c:326)
==2118357==    by 0x4E8BA6F: g_object_unref (in /usr/lib64/libgobject-2.0.so.0.6600.8)
==2118357==    by 0x496E147: virObjectUnref (virobject.c:380)
==2118357==    by 0x496E147: virObjectUnref (virobject.c:373)
==2118357==    by 0x49BFE64: virDomainDiskDefFree (domain_conf.c:2261)
==2118357==    by 0x49BFE64: virDomainDiskDefFree (domain_conf.c:2256)
==2118357==    by 0x49DAB1B: virDomainDefFree (domain_conf.c:3598)
==2118357==    by 0x49DAB1B: virDomainDefFree (domain_conf.c:3562)
==2118357==    by 0x49DB40C: virDomainObjDispose (domain_conf.c:3744)
==2118357==    by 0x496DAED: vir_object_finalize (virobject.c:326)
==2118357==    by 0x4E8BA6F: g_object_unref (in /usr/lib64/libgobject-2.0.so.0.6600.8)
==2118357==    by 0x496E147: virObjectUnref (virobject.c:380)
==2118357==    by 0x496E147: virObjectUnref (virobject.c:373)
==2118357==  Address 0x1b92a140 is 32 bytes inside a block of size 520 free'd
==2118357==    at 0x483A9F5: free (vg_replace_malloc.c:538)
==2118357==    by 0x4D9170C: g_free (in /usr/lib64/libglib-2.0.so.0.6600.8)
==2118357==    by 0x4DAB9FF: g_slice_free1 (in /usr/lib64/libglib-2.0.so.0.6600.8)
==2118357==    by 0x4EA0203: g_type_free_instance (in /usr/lib64/libgobject-2.0.so.0.6600.8)
==2118357==    by 0x496E147: virObjectUnref (virobject.c:380)
==2118357==    by 0x496E147: virObjectUnref (virobject.c:373)
==2118357==    by 0x16033BA5: qemuBlockJobProcessEventCompletedCommit (qemu_blockjob.c:1229)
==2118357==    by 0x16033BA5: qemuBlockJobEventProcessConcludedTransition (qemu_blockjob.c:1546)
==2118357==    by 0x16033BA5: qemuBlockJobEventProcessConcluded (qemu_blockjob.c:1647)
==2118357==    by 0x16033BA5: qemuBlockJobEventProcess (qemu_blockjob.c:1686)
==2118357==    by 0x16033BA5: qemuBlockJobUpdate (qemu_blockjob.c:1732)
==2118357==    by 0x160A8178: processJobStatusChangeEvent (qemu_driver.c:4195)
==2118357==    by 0x160A8178: qemuProcessEventHandler (qemu_driver.c:4355)
==2118357==    by 0x498A541: virThreadPoolWorker (virthreadpool.c:163)
==2118357==    by 0x4989BC4: virThreadHelper (virthread.c:233)
==2118357==    by 0x59C23F8: start_thread (in /usr/lib64/libpthread-2.32.so)
==2118357==    by 0x51E0B52: clone (in /usr/lib64/libc-2.32.so)
==2118357==  Block was alloc'd at
==2118357==    at 0x4839809: malloc (vg_replace_malloc.c:307)
==2118357==    by 0x4D94BB8: g_malloc (in /usr/lib64/libglib-2.0.so.0.6600.8)
==2118357==    by 0x4DAC481: g_slice_alloc (in /usr/lib64/libglib-2.0.so.0.6600.8)
==2118357==    by 0x4DACB0D: g_slice_alloc0 (in /usr/lib64/libglib-2.0.so.0.6600.8)
==2118357==    by 0x4EA4C19: g_type_create_instance (in /usr/lib64/libgobject-2.0.so.0.6600.8)
==2118357==    by 0x4E8D8C4: ??? (in /usr/lib64/libgobject-2.0.so.0.6600.8)
==2118357==    by 0x4E8E69C: g_object_new_with_properties (in /usr/lib64/libgobject-2.0.so.0.6600.8)
==2118357==    by 0x4E8F310: g_object_new (in /usr/lib64/libgobject-2.0.so.0.6600.8)
==2118357==    by 0x496E093: virObjectNew (virobject.c:255)
==2118357==    by 0x4A33B12: virStorageSourceNew (storage_source_conf.c:1191)
==2118357==    by 0x4A33CF2: virStorageSourceCopy (storage_source_conf.c:801)
==2118357==    by 0x1610AAB4: qemuSnapshotDiskPrepareOne (qemu_snapshot.c:1068)

(note that there's wrong function printed in the trace: qemuBlockJobProcessEventCompletedCommit, whereas the bug is in qemuBlockJobProcessEventCompletedPull)

Comment 11 Peter Krempa 2021-04-13 09:01:04 UTC
Fixed upstream:

commit b4d020790642fa4d7b8a6783b81d5d9d73cbe3d9
Author: Peter Krempa <pkrempa>
Date:   Mon Apr 12 17:42:23 2021 +0200

    qemuBlockJobProcessEventCompletedPull: Add backingStore terminators if base is NULL
    
    When doing a blockpull with NULL base the full contents of the disk are
    pulled into the topmost image which then becomes fully self-contained.
    
    qemuBlockJobProcessEventCompletedPull doesn't install the backing chain
    terminators though, although it's guaranteed that there will be no
    backing chain behind disk->src.
    
    Add the terminators for completness and for disabling backing chain
    detection on further boots.
    
    Signed-off-by: Peter Krempa <pkrempa>
    Reviewed-by: Pavel Hrdina <phrdina>

commit 46e748aa02cbd5923fa4b500352f528de35dc665
Author: Peter Krempa <pkrempa>
Date:   Mon Apr 12 17:24:22 2021 +0200

    qemuBlockJobProcessEventCompletedPull: Avoid dangling pointer after blockpull
    
    When doing a full block pull job (base == NULL) and the config XML
    contains a compatible disk, the completer function would leave a
    dangling pointer in 'cfgdisk->src->backingStore' as cfgdisk->src would
    be set to the value of 'cfgbase' which was always set to
    'cfgdisk->src->backingStore'.
    
    This is wrong though since for the live definition XML we set the
    respective counterpart to 'job->data.pull.base' which is NULL in the
    above scenario.
    
    This leads to a invalid pointer read when saving the config XML and may
    end up in a crash.
    
    Resolve it by setting 'cfgbase' only when 'job->data.pull.base' is
    non-NULL.
    
    Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1946918
    Signed-off-by: Peter Krempa <pkrempa>
    Reviewed-by: Pavel Hrdina <phrdina>

Comment 15 Meina Li 2021-06-09 02:08:04 UTC
Verified Version:
libvirt-7.3.0-1.el9.x86_64
qemu-kvm-6.0.0-3.el9.x86_64

Verified Steps:
1. Prepare a running guest.
# virsh domstate lmn 
running
# virsh dumpxml lmn --inactive | grep /disk -B6
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/var/lib/libvirt/images/lmn.qcow2'/>
      <backingStore/>
      <target dev='vda' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
    </disk>
2. Create snapshots for the guest.
# for i in {1..4}; do virsh snapshot-create-as lmn s$i --disk-only; done
Domain snapshot s1 created
Domain snapshot s2 created
Domain snapshot s3 created
Domain snapshot s4 created
3. Do blockpull for the virtual disk.
# virsh blockpull lmn vda --wait --verbose --async
Block Pull: [100 %]
Pull complete
4. Destroy the guest and undefine it with --snapshots-metadata.
# virsh destroy lmn
Domain 'lmn' destroyed
# virsh undefine lmn --snapshots-metadata
Domain 'lmn' has been undefined


Note You need to log in before you can comment on or make changes to this bug.