Bug 1946918
Summary: | Undefine guest with --snapshots-metadata failed after blockpull | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 9 | Reporter: | Meina Li <meili> | ||||||||
Component: | libvirt | Assignee: | Peter Krempa <pkrempa> | ||||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Meina Li <meili> | ||||||||
Severity: | unspecified | Docs Contact: | |||||||||
Priority: | medium | ||||||||||
Version: | 9.0 | CC: | jdenemar, pkrempa, virt-maint, xuzhang | ||||||||
Target Milestone: | beta | Keywords: | Automation, Triaged | ||||||||
Target Release: | --- | ||||||||||
Hardware: | Unspecified | ||||||||||
OS: | Unspecified | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | libvirt-7.3.0-1.el9 | Doc Type: | If docs needed, set a value | ||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | |||||||||||
: | 1948619 (view as bug list) | Environment: | |||||||||
Last Closed: | 2021-12-07 21:57:54 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | 7.3.0 | ||||||||
Embargoed: | |||||||||||
Bug Depends On: | |||||||||||
Bug Blocks: | 1948619 | ||||||||||
Attachments: |
|
I didn't manage to reproduce the problem. Could you please also attach debug logs of libvirtd when the issue reproduces. Additionally the backtrace which is attached is from a idle libvirtd not from any crashed one, so it doesn't help at all. If libvirtd really crashed the backtrace needs to be collected from the crashed process. (In reply to Peter Krempa from comment #1) > I didn't manage to reproduce the problem. Could you please also attach debug > logs of libvirtd when the issue reproduces. > > Additionally the backtrace which is attached is from a idle libvirtd not > from any crashed one, so it doesn't help at all. If libvirtd really crashed > the backtrace needs to be collected from the crashed process. I found the guest is undefined successfully even though it has the "Input/output error". And the libvirtd maybe doesn't crash after checking the backtrace. Please check the libvirtd log. Thank you. Created attachment 1770110 [details]
libvirtd.log
Could you please try getting the coredump backtrace from the actual crash e.g. via "coredumpctl dump /usr/sbin/libvirtd" Created attachment 1770504 [details]
coredump.log
Thank you! The important part is: Stack trace of thread 1099086: #0 0x00007fb5d40da7bd virObjectUnref (libvirt.so.0 + 0x10a7bd) #1 0x00007fb5d40f24c1 virStorageSourceBackingStoreClear (libvirt.so.0 + 0x1224c1) #2 0x00007fb5d40fb7d3 virStorageSourceClear (libvirt.so.0 + 0x12b7d3) #3 0x00007fb5d40da76e vir_object_finalize (libvirt.so.0 + 0x10a76e) #4 0x00007fb5d3e54e1c g_object_unref (libgobject-2.0.so.0 + 0x20e1c) #5 0x00007fb5d40da7d8 virObjectUnref (libvirt.so.0 + 0x10a7d8) #6 0x00007fb5d4123bc9 virDomainDiskDefFree (libvirt.so.0 + 0x153bc9) #7 0x00007fb5d4124ebc virDomainDefFree (libvirt.so.0 + 0x154ebc) #8 0x00007fb5d4125b31 virDomainObjDispose (libvirt.so.0 + 0x155b31) #9 0x00007fb5d40da76e vir_object_finalize (libvirt.so.0 + 0x10a76e) #10 0x00007fb5d3e54e1c g_object_unref (libgobject-2.0.so.0 + 0x20e1c) #11 0x00007fb5d40da7d8 virObjectUnref (libvirt.so.0 + 0x10a7d8) #12 0x00007fb5d4125e8d virDomainObjEndAPI (libvirt.so.0 + 0x155e8d) #13 0x00007fb5c8760f8a qemuDomainUndefineFlags (libvirt_driver_qemu.so + 0xcaf8a) #14 0x00007fb5d42a61e9 virDomainUndefineFlags (libvirt.so.0 + 0x2d61e9) #15 0x0000559b1807ed08 remoteDispatchDomainUndefineFlagsHelper.lto_priv.0 (libvirtd + 0x46d08) #16 0x00007fb5d41bdab6 virNetServerProgramDispatch (libvirt.so.0 + 0x1edab6) #17 0x00007fb5d41c3b28 virNetServerHandleJob (libvirt.so.0 + 0x1f3b28) #18 0x00007fb5d40fbcf2 virThreadPoolWorker (libvirt.so.0 + 0x12bcf2) #19 0x00007fb5d4102e29 virThreadHelper (libvirt.so.0 + 0x132e29) #20 0x00007fb5d336d299 start_thread (libpthread.so.0 + 0x9299) #21 0x00007fb5d3b49863 __clone (libc.so.6 + 0x100863) Did your VM config have any <backingStore> for disks defined explicitly prior to the startup? Ideally post the full VM xml prior to the startup of the VM. (In reply to Peter Krempa from comment #7) > Did your VM config have any <backingStore> for disks defined explicitly > prior to the startup? Ideally post the full VM xml prior to the startup of > the VM. Yes, it includes <backingStore/> in xml before startup. But this scenario can be passed in RHEL-AV 8.4. And it can undefine without error after I removed the element. <disk type='file' device='disk'>virStorageSourceClear <driver name='qemu' type='qcow2' cache='none'/> <source file='/var/lib/avocado/data/avocado-vt/images/jeos-27-x86_64.qcow2'/> <backingStore/> <target dev='vda' bus='virtio'/> <address type='pci' domain='0x0000' bus='0x08' slot='0x00' function='0x0'/> </disk> Except this, it also hit this error for iscsi network disk even though it doesn't include <backingStore/> in XML: <disk type='network' device='disk'> <driver name='qemu' type='raw' cache='none'/> <source protocol='iscsi' name='iqn.2021-04.com.virttest:emulated-iscsi.target/0'> <host name='127.0.0.1' port='3260'/> </source> <target dev='vda' bus='virtio'/> <address type='pci' domain='0x0000' bus='0x08' slot='0x00' function='0x0'/> </disk> Similar stack trace for iscsi network disk: Stack trace of thread 1103759: #0 0x00007f1eb5b157bd virObjectUnref (libvirt.so.0 + 0x10a7bd) #1 0x00007f1eb5b2d4c1 virStorageSourceBackingStoreClear (libvirt.so.0 + 0x1224c1) #2 0x00007f1eb5b367d3 virStorageSourceClear (libvirt.so.0 + 0x12b7d3) #3 0x00007f1eb5b1576e vir_object_finalize (libvirt.so.0 + 0x10a76e) #4 0x00007f1eb588fe1c g_object_unref (libgobject-2.0.so.0 + 0x20e1c) #5 0x00007f1eb5b157d8 virObjectUnref (libvirt.so.0 + 0x10a7d8) #6 0x00007f1eb5b5ebc9 virDomainDiskDefFree (libvirt.so.0 + 0x153bc9) #7 0x00007f1eb5b5febc virDomainDefFree (libvirt.so.0 + 0x154ebc) #8 0x00007f1eb5b60b31 virDomainObjDispose (libvirt.so.0 + 0x155b31) #9 0x00007f1eb5b1576e vir_object_finalize (libvirt.so.0 + 0x10a76e) #10 0x00007f1eb588fe1c g_object_unref (libgobject-2.0.so.0 + 0x20e1c) #11 0x00007f1eb5b157d8 virObjectUnref (libvirt.so.0 + 0x10a7d8) #12 0x00007f1eb5b60e8d virDomainObjEndAPI (libvirt.so.0 + 0x155e8d) #13 0x00007f1ea9980f8a qemuDomainUndefineFlags (libvirt_driver_qemu.so + 0xcaf8a) #14 0x00007f1eb5ce11e9 virDomainUndefineFlags (libvirt.so.0 + 0x2d61e9) #15 0x000055b06bdbcd08 remoteDispatchDomainUndefineFlagsHelper.lto_priv.0 (libvirtd + 0x46d08) #16 0x00007f1eb5bf8ab6 virNetServerProgramDispatch (libvirt.so.0 + 0x1edab6) #17 0x00007f1eb5bfeb28 virNetServerHandleJob (libvirt.so.0 + 0x1f3b28) #18 0x00007f1eb5b36cf2 virThreadPoolWorker (libvirt.so.0 + 0x12bcf2) #19 0x00007f1eb5b3de29 virThreadHelper (libvirt.so.0 + 0x132e29) #20 0x00007f1eb4da8299 start_thread (libpthread.so.0 + 0x9299) #21 0x00007f1eb5584863 __clone (libc.so.6 + 0x100863) Yes, that will be regardless of disk type. The handling differs depending on the presence of <backingStore/>. Thanks for the info. Note that the bug may not always manifest itself in a crash, but it's easily observable when running libvirtd under valgrind. Already after the 'virsh blockpull' step an invalid read is reported: ==2118357== Invalid read of size 8 ==2118357== at 0x496E121: VIR_IS_OBJECT (virobject.h:44) ==2118357== by 0x496E121: virObjectUnref (virobject.c:377) ==2118357== by 0x4A33ABC: virStorageSourceBackingStoreClear (storage_source_conf.c:1105) ==2118357== by 0x4A34515: virStorageSourceClear (storage_source_conf.c:1142) ==2118357== by 0x496DAED: vir_object_finalize (virobject.c:326) ==2118357== by 0x4E8BA6F: g_object_unref (in /usr/lib64/libgobject-2.0.so.0.6600.8) ==2118357== by 0x496E147: virObjectUnref (virobject.c:380) ==2118357== by 0x496E147: virObjectUnref (virobject.c:373) ==2118357== by 0x49BFE64: virDomainDiskDefFree (domain_conf.c:2261) ==2118357== by 0x49BFE64: virDomainDiskDefFree (domain_conf.c:2256) ==2118357== by 0x49DAB1B: virDomainDefFree (domain_conf.c:3598) ==2118357== by 0x49DAB1B: virDomainDefFree (domain_conf.c:3562) ==2118357== by 0x49DB40C: virDomainObjDispose (domain_conf.c:3744) ==2118357== by 0x496DAED: vir_object_finalize (virobject.c:326) ==2118357== by 0x4E8BA6F: g_object_unref (in /usr/lib64/libgobject-2.0.so.0.6600.8) ==2118357== by 0x496E147: virObjectUnref (virobject.c:380) ==2118357== by 0x496E147: virObjectUnref (virobject.c:373) ==2118357== Address 0x1b92a140 is 32 bytes inside a block of size 520 free'd ==2118357== at 0x483A9F5: free (vg_replace_malloc.c:538) ==2118357== by 0x4D9170C: g_free (in /usr/lib64/libglib-2.0.so.0.6600.8) ==2118357== by 0x4DAB9FF: g_slice_free1 (in /usr/lib64/libglib-2.0.so.0.6600.8) ==2118357== by 0x4EA0203: g_type_free_instance (in /usr/lib64/libgobject-2.0.so.0.6600.8) ==2118357== by 0x496E147: virObjectUnref (virobject.c:380) ==2118357== by 0x496E147: virObjectUnref (virobject.c:373) ==2118357== by 0x16033BA5: qemuBlockJobProcessEventCompletedCommit (qemu_blockjob.c:1229) ==2118357== by 0x16033BA5: qemuBlockJobEventProcessConcludedTransition (qemu_blockjob.c:1546) ==2118357== by 0x16033BA5: qemuBlockJobEventProcessConcluded (qemu_blockjob.c:1647) ==2118357== by 0x16033BA5: qemuBlockJobEventProcess (qemu_blockjob.c:1686) ==2118357== by 0x16033BA5: qemuBlockJobUpdate (qemu_blockjob.c:1732) ==2118357== by 0x160A8178: processJobStatusChangeEvent (qemu_driver.c:4195) ==2118357== by 0x160A8178: qemuProcessEventHandler (qemu_driver.c:4355) ==2118357== by 0x498A541: virThreadPoolWorker (virthreadpool.c:163) ==2118357== by 0x4989BC4: virThreadHelper (virthread.c:233) ==2118357== by 0x59C23F8: start_thread (in /usr/lib64/libpthread-2.32.so) ==2118357== by 0x51E0B52: clone (in /usr/lib64/libc-2.32.so) ==2118357== Block was alloc'd at ==2118357== at 0x4839809: malloc (vg_replace_malloc.c:307) ==2118357== by 0x4D94BB8: g_malloc (in /usr/lib64/libglib-2.0.so.0.6600.8) ==2118357== by 0x4DAC481: g_slice_alloc (in /usr/lib64/libglib-2.0.so.0.6600.8) ==2118357== by 0x4DACB0D: g_slice_alloc0 (in /usr/lib64/libglib-2.0.so.0.6600.8) ==2118357== by 0x4EA4C19: g_type_create_instance (in /usr/lib64/libgobject-2.0.so.0.6600.8) ==2118357== by 0x4E8D8C4: ??? (in /usr/lib64/libgobject-2.0.so.0.6600.8) ==2118357== by 0x4E8E69C: g_object_new_with_properties (in /usr/lib64/libgobject-2.0.so.0.6600.8) ==2118357== by 0x4E8F310: g_object_new (in /usr/lib64/libgobject-2.0.so.0.6600.8) ==2118357== by 0x496E093: virObjectNew (virobject.c:255) ==2118357== by 0x4A33B12: virStorageSourceNew (storage_source_conf.c:1191) ==2118357== by 0x4A33CF2: virStorageSourceCopy (storage_source_conf.c:801) ==2118357== by 0x1610AAB4: qemuSnapshotDiskPrepareOne (qemu_snapshot.c:1068) (note that there's wrong function printed in the trace: qemuBlockJobProcessEventCompletedCommit, whereas the bug is in qemuBlockJobProcessEventCompletedPull) Fixed upstream: commit b4d020790642fa4d7b8a6783b81d5d9d73cbe3d9 Author: Peter Krempa <pkrempa> Date: Mon Apr 12 17:42:23 2021 +0200 qemuBlockJobProcessEventCompletedPull: Add backingStore terminators if base is NULL When doing a blockpull with NULL base the full contents of the disk are pulled into the topmost image which then becomes fully self-contained. qemuBlockJobProcessEventCompletedPull doesn't install the backing chain terminators though, although it's guaranteed that there will be no backing chain behind disk->src. Add the terminators for completness and for disabling backing chain detection on further boots. Signed-off-by: Peter Krempa <pkrempa> Reviewed-by: Pavel Hrdina <phrdina> commit 46e748aa02cbd5923fa4b500352f528de35dc665 Author: Peter Krempa <pkrempa> Date: Mon Apr 12 17:24:22 2021 +0200 qemuBlockJobProcessEventCompletedPull: Avoid dangling pointer after blockpull When doing a full block pull job (base == NULL) and the config XML contains a compatible disk, the completer function would leave a dangling pointer in 'cfgdisk->src->backingStore' as cfgdisk->src would be set to the value of 'cfgbase' which was always set to 'cfgdisk->src->backingStore'. This is wrong though since for the live definition XML we set the respective counterpart to 'job->data.pull.base' which is NULL in the above scenario. This leads to a invalid pointer read when saving the config XML and may end up in a crash. Resolve it by setting 'cfgbase' only when 'job->data.pull.base' is non-NULL. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1946918 Signed-off-by: Peter Krempa <pkrempa> Reviewed-by: Pavel Hrdina <phrdina> Verified Version: libvirt-7.3.0-1.el9.x86_64 qemu-kvm-6.0.0-3.el9.x86_64 Verified Steps: 1. Prepare a running guest. # virsh domstate lmn running # virsh dumpxml lmn --inactive | grep /disk -B6 <disk type='file' device='disk'> <driver name='qemu' type='qcow2'/> <source file='/var/lib/libvirt/images/lmn.qcow2'/> <backingStore/> <target dev='vda' bus='virtio'/> <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/> </disk> 2. Create snapshots for the guest. # for i in {1..4}; do virsh snapshot-create-as lmn s$i --disk-only; done Domain snapshot s1 created Domain snapshot s2 created Domain snapshot s3 created Domain snapshot s4 created 3. Do blockpull for the virtual disk. # virsh blockpull lmn vda --wait --verbose --async Block Pull: [100 %] Pull complete 4. Destroy the guest and undefine it with --snapshots-metadata. # virsh destroy lmn Domain 'lmn' destroyed # virsh undefine lmn --snapshots-metadata Domain 'lmn' has been undefined |
Created attachment 1769793 [details] bt.log Description of problem: Undefine guest with --snapshots-metadata failed after blockpull Version-Release number of selected component (if applicable): libvirt-7.0.0-4.el9.x86_64 qemu-kvm-5.2.0-11.el9.x86_64 kernel-5.11.0-2.el9.x86_64 How reproducible: 100% Steps to Reproduce: 1. Prepare a running guest. # virsh domstate avocado-vt-vm1 running 2. Create snapshots for the guest. # for i in {1..4}; do virsh snapshot-create-as avocado-vt-vm1 s$i --disk-only; done Domain snapshot s1 created Domain snapshot s2 created Domain snapshot s3 created Domain snapshot s4 created 3. Do blockpull for the virtual disk. # virsh blockpull avocado-vt-vm1 vda --wait --verbose --async Block Pull: [100 %] Pull complete 4. Destroy the guest and undefine it with --snapshots-metadata. # virsh destroy avocado-vt-vm1 Domain 'avocado-vt-vm1' destroyed # virsh undefine avocado-vt-vm1 --snapshots-metadata error: Disconnected from qemu:///system due to end of file error: Failed to undefine domain 'avocado-vt-vm1' error: End of file while reading data: Input/output error Actual results: Undefine the guest failed Expected results: Undefine the guest successfully Additional info: