The same bug is present also in rhel8-av. Please note that it may not manifest itself in a crash, see below for a way to find it using valgrind. +++ This bug was initially created as a clone of Bug #1946918 +++ Description of problem: Undefine guest with --snapshots-metadata failed after blockpull Version-Release number of selected component (if applicable): libvirt-7.0.0-4.el9.x86_64 qemu-kvm-5.2.0-11.el9.x86_64 kernel-5.11.0-2.el9.x86_64 How reproducible: 100% Steps to Reproduce: 1. Prepare a running guest. # virsh domstate avocado-vt-vm1 running 2. Create snapshots for the guest. # for i in {1..4}; do virsh snapshot-create-as avocado-vt-vm1 s$i --disk-only; done Domain snapshot s1 created Domain snapshot s2 created Domain snapshot s3 created Domain snapshot s4 created 3. Do blockpull for the virtual disk. # virsh blockpull avocado-vt-vm1 vda --wait --verbose --async Block Pull: [100 %] Pull complete 4. Destroy the guest and undefine it with --snapshots-metadata. # virsh destroy avocado-vt-vm1 Domain 'avocado-vt-vm1' destroyed # virsh undefine avocado-vt-vm1 --snapshots-metadata error: Disconnected from qemu:///system due to end of file error: Failed to undefine domain 'avocado-vt-vm1' error: End of file while reading data: Input/output error Actual results: Undefine the guest failed Expected results: Undefine the guest successfully Additional info: --- Additional comment from Peter Krempa on 2021-04-09 10:08:26 CEST --- Did your VM config have any <backingStore> for disks defined explicitly prior to the startup? Ideally post the full VM xml prior to the startup of the VM. --- Additional comment from Meina Li on 2021-04-09 11:11:26 CEST --- (In reply to Peter Krempa from comment #7) > Did your VM config have any <backingStore> for disks defined explicitly > prior to the startup? Ideally post the full VM xml prior to the startup of > the VM. Yes, it includes <backingStore/> in xml before startup. But this scenario can be passed in RHEL-AV 8.4. And it can undefine without error after I removed the element. <disk type='file' device='disk'>virStorageSourceClear <driver name='qemu' type='qcow2' cache='none'/> <source file='/var/lib/avocado/data/avocado-vt/images/jeos-27-x86_64.qcow2'/> <backingStore/> <target dev='vda' bus='virtio'/> <address type='pci' domain='0x0000' bus='0x08' slot='0x00' function='0x0'/> </disk> <snip> --- Additional comment from Peter Krempa on 2021-04-12 17:50:50 CEST --- Note that the bug may not always manifest itself in a crash, but it's easily observable when running libvirtd under valgrind. Already after the 'virsh blockpull' step an invalid read is reported: ==2118357== Invalid read of size 8 ==2118357== at 0x496E121: VIR_IS_OBJECT (virobject.h:44) ==2118357== by 0x496E121: virObjectUnref (virobject.c:377) ==2118357== by 0x4A33ABC: virStorageSourceBackingStoreClear (storage_source_conf.c:1105) ==2118357== by 0x4A34515: virStorageSourceClear (storage_source_conf.c:1142) ==2118357== by 0x496DAED: vir_object_finalize (virobject.c:326) ==2118357== by 0x4E8BA6F: g_object_unref (in /usr/lib64/libgobject-2.0.so.0.6600.8) ==2118357== by 0x496E147: virObjectUnref (virobject.c:380) ==2118357== by 0x496E147: virObjectUnref (virobject.c:373) ==2118357== by 0x49BFE64: virDomainDiskDefFree (domain_conf.c:2261) ==2118357== by 0x49BFE64: virDomainDiskDefFree (domain_conf.c:2256) ==2118357== by 0x49DAB1B: virDomainDefFree (domain_conf.c:3598) ==2118357== by 0x49DAB1B: virDomainDefFree (domain_conf.c:3562) ==2118357== by 0x49DB40C: virDomainObjDispose (domain_conf.c:3744) ==2118357== by 0x496DAED: vir_object_finalize (virobject.c:326) ==2118357== by 0x4E8BA6F: g_object_unref (in /usr/lib64/libgobject-2.0.so.0.6600.8) ==2118357== by 0x496E147: virObjectUnref (virobject.c:380) ==2118357== by 0x496E147: virObjectUnref (virobject.c:373) ==2118357== Address 0x1b92a140 is 32 bytes inside a block of size 520 free'd ==2118357== at 0x483A9F5: free (vg_replace_malloc.c:538) ==2118357== by 0x4D9170C: g_free (in /usr/lib64/libglib-2.0.so.0.6600.8) ==2118357== by 0x4DAB9FF: g_slice_free1 (in /usr/lib64/libglib-2.0.so.0.6600.8) ==2118357== by 0x4EA0203: g_type_free_instance (in /usr/lib64/libgobject-2.0.so.0.6600.8) ==2118357== by 0x496E147: virObjectUnref (virobject.c:380) ==2118357== by 0x496E147: virObjectUnref (virobject.c:373) ==2118357== by 0x16033BA5: qemuBlockJobProcessEventCompletedCommit (qemu_blockjob.c:1229) ==2118357== by 0x16033BA5: qemuBlockJobEventProcessConcludedTransition (qemu_blockjob.c:1546) ==2118357== by 0x16033BA5: qemuBlockJobEventProcessConcluded (qemu_blockjob.c:1647) ==2118357== by 0x16033BA5: qemuBlockJobEventProcess (qemu_blockjob.c:1686) ==2118357== by 0x16033BA5: qemuBlockJobUpdate (qemu_blockjob.c:1732) ==2118357== by 0x160A8178: processJobStatusChangeEvent (qemu_driver.c:4195) ==2118357== by 0x160A8178: qemuProcessEventHandler (qemu_driver.c:4355) ==2118357== by 0x498A541: virThreadPoolWorker (virthreadpool.c:163) ==2118357== by 0x4989BC4: virThreadHelper (virthread.c:233) ==2118357== by 0x59C23F8: start_thread (in /usr/lib64/libpthread-2.32.so) ==2118357== by 0x51E0B52: clone (in /usr/lib64/libc-2.32.so) ==2118357== Block was alloc'd at ==2118357== at 0x4839809: malloc (vg_replace_malloc.c:307) ==2118357== by 0x4D94BB8: g_malloc (in /usr/lib64/libglib-2.0.so.0.6600.8) ==2118357== by 0x4DAC481: g_slice_alloc (in /usr/lib64/libglib-2.0.so.0.6600.8) ==2118357== by 0x4DACB0D: g_slice_alloc0 (in /usr/lib64/libglib-2.0.so.0.6600.8) ==2118357== by 0x4EA4C19: g_type_create_instance (in /usr/lib64/libgobject-2.0.so.0.6600.8) ==2118357== by 0x4E8D8C4: ??? (in /usr/lib64/libgobject-2.0.so.0.6600.8) ==2118357== by 0x4E8E69C: g_object_new_with_properties (in /usr/lib64/libgobject-2.0.so.0.6600.8) ==2118357== by 0x4E8F310: g_object_new (in /usr/lib64/libgobject-2.0.so.0.6600.8) ==2118357== by 0x496E093: virObjectNew (virobject.c:255) ==2118357== by 0x4A33B12: virStorageSourceNew (storage_source_conf.c:1191) ==2118357== by 0x4A33CF2: virStorageSourceCopy (storage_source_conf.c:801) ==2118357== by 0x1610AAB4: qemuSnapshotDiskPrepareOne (qemu_snapshot.c:1068) (note that there's wrong function printed in the trace: qemuBlockJobProcessEventCompletedCommit, whereas the bug is in qemuBlockJobProcessEventCompletedPull)
Fixed upstream: commit b4d020790642fa4d7b8a6783b81d5d9d73cbe3d9 Author: Peter Krempa <pkrempa> Date: Mon Apr 12 17:42:23 2021 +0200 qemuBlockJobProcessEventCompletedPull: Add backingStore terminators if base is NULL When doing a blockpull with NULL base the full contents of the disk are pulled into the topmost image which then becomes fully self-contained. qemuBlockJobProcessEventCompletedPull doesn't install the backing chain terminators though, although it's guaranteed that there will be no backing chain behind disk->src. Add the terminators for completness and for disabling backing chain detection on further boots. Signed-off-by: Peter Krempa <pkrempa> Reviewed-by: Pavel Hrdina <phrdina> commit 46e748aa02cbd5923fa4b500352f528de35dc665 Author: Peter Krempa <pkrempa> Date: Mon Apr 12 17:24:22 2021 +0200 qemuBlockJobProcessEventCompletedPull: Avoid dangling pointer after blockpull When doing a full block pull job (base == NULL) and the config XML contains a compatible disk, the completer function would leave a dangling pointer in 'cfgdisk->src->backingStore' as cfgdisk->src would be set to the value of 'cfgbase' which was always set to 'cfgdisk->src->backingStore'. This is wrong though since for the live definition XML we set the respective counterpart to 'job->data.pull.base' which is NULL in the above scenario. This leads to a invalid pointer read when saving the config XML and may end up in a crash. Resolve it by setting 'cfgbase' only when 'job->data.pull.base' is non-NULL. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1946918 Signed-off-by: Peter Krempa <pkrempa> Reviewed-by: Pavel Hrdina <phrdina>
Verified Version: libvirt-7.3.0-1.module+el8.5.0+11004+f4810536.x86_64 qemu-kvm-6.0.0-16.module+el8.5.0+10848+2dccc46d.x86_64 Verified Steps: 1. Prepare a running guest. # virsh domstate lmn running # virsh dumpxml lmn --inactive | grep /disk -B6 <disk type='file' device='disk'> <driver name='qemu' type='qcow2'/> <source file='/var/lib/avocado/data/avocado-vt/images/jeos-27-x86_64.qcow2'/> <backingStore/> <target dev='vda' bus='virtio'/> <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/> </disk> 2. Create snapshots for the guest. # for i in {1..4}; do virsh snapshot-create-as lmn s$i --disk-only; done Domain snapshot s1 created Domain snapshot s2 created Domain snapshot s3 created Domain snapshot s4 created 3. Do blockpull for the virtual disk. # virsh blockpull lmn vda --wait --verbose --async Block Pull: [100 %] Pull complete 4. Destroy the guest and undefine it with --snapshots-metadata. # virsh destroy lmn Domain 'lmn' destroyed # virsh undefine lmn --snapshots-metadata Domain 'lmn' has been undefined
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (virt:av bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:4684