1948619 – Undefine guest with --snapshots-metadata failed after blockpull

Bug 1948619 - Undefine guest with --snapshots-metadata failed after blockpull

Summary: Undefine guest with --snapshots-metadata failed after blockpull

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux Advanced Virtualization
Classification:	Red Hat
Component:	libvirt
Sub Component:
Version:	8.5
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	rc
Target Release:	8.4
Assignee:	Peter Krempa
QA Contact:	Meina Li
Docs Contact:
URL:
Whiteboard:
Depends On:	1946918
Blocks:
TreeView+	depends on / blocked

Reported:	2021-04-12 15:56 UTC by Peter Krempa
Modified:	2021-11-16 08:29 UTC (History)
CC List:	9 users (show)
Fixed In Version:	libvirt-7.3.0-1.el8
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:	1946918
Environment:
Last Closed:	2021-11-16 07:52:31 UTC
Type:	Bug
Target Upstream Version:	7.3.0
Embargoed:
Dependent Products:
Flags:	pm-rhel: mirror+

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2021:4684	0	None	None	None	2021-11-16 07:53:15 UTC

Description Peter Krempa 2021-04-12 15:56:03 UTC

The same bug is present also in rhel8-av. Please note that it may not manifest itself in a crash, see below for a way to find it using valgrind.

+++ This bug was initially created as a clone of Bug #1946918 +++

Description of problem:
Undefine guest with --snapshots-metadata failed after blockpull

Version-Release number of selected component (if applicable):
libvirt-7.0.0-4.el9.x86_64
qemu-kvm-5.2.0-11.el9.x86_64
kernel-5.11.0-2.el9.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Prepare a running guest.
# virsh domstate avocado-vt-vm1 
running

2. Create snapshots for the guest.
# for i in {1..4}; do virsh snapshot-create-as avocado-vt-vm1 s$i --disk-only; done
Domain snapshot s1 created
Domain snapshot s2 created
Domain snapshot s3 created
Domain snapshot s4 created

3. Do blockpull for the virtual disk.
# virsh blockpull avocado-vt-vm1 vda --wait --verbose --async
Block Pull: [100 %]
Pull complete

4. Destroy the guest and undefine it with --snapshots-metadata.
# virsh destroy avocado-vt-vm1 
Domain 'avocado-vt-vm1' destroyed
# virsh undefine avocado-vt-vm1 --snapshots-metadata
error: Disconnected from qemu:///system due to end of file
error: Failed to undefine domain 'avocado-vt-vm1'
error: End of file while reading data: Input/output error

Actual results:
Undefine the guest failed 

Expected results:
Undefine the guest successfully

Additional info:

--- Additional comment from Peter Krempa on 2021-04-09 10:08:26 CEST ---

Did your VM config have any <backingStore> for disks defined explicitly prior to the startup? Ideally post the full VM xml prior to the startup of the VM.

--- Additional comment from Meina Li on 2021-04-09 11:11:26 CEST ---

(In reply to Peter Krempa from comment #7)
> Did your VM config have any <backingStore> for disks defined explicitly
> prior to the startup? Ideally post the full VM xml prior to the startup of
> the VM.

Yes, it includes <backingStore/> in xml before startup. But this scenario can be passed in RHEL-AV 8.4.
And it can undefine without error after I removed the element. 

   <disk type='file' device='disk'>virStorageSourceClear
      <driver name='qemu' type='qcow2' cache='none'/>
      <source file='/var/lib/avocado/data/avocado-vt/images/jeos-27-x86_64.qcow2'/>
      <backingStore/>
      <target dev='vda' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x08' slot='0x00' function='0x0'/>
    </disk>

<snip>

--- Additional comment from Peter Krempa on 2021-04-12 17:50:50 CEST ---

Note that the bug may not always manifest itself in a crash, but it's easily observable when running libvirtd under valgrind.

Already after the 'virsh blockpull' step an invalid read is reported:

==2118357== Invalid read of size 8
==2118357==    at 0x496E121: VIR_IS_OBJECT (virobject.h:44)
==2118357==    by 0x496E121: virObjectUnref (virobject.c:377)
==2118357==    by 0x4A33ABC: virStorageSourceBackingStoreClear (storage_source_conf.c:1105)
==2118357==    by 0x4A34515: virStorageSourceClear (storage_source_conf.c:1142)
==2118357==    by 0x496DAED: vir_object_finalize (virobject.c:326)
==2118357==    by 0x4E8BA6F: g_object_unref (in /usr/lib64/libgobject-2.0.so.0.6600.8)
==2118357==    by 0x496E147: virObjectUnref (virobject.c:380)
==2118357==    by 0x496E147: virObjectUnref (virobject.c:373)
==2118357==    by 0x49BFE64: virDomainDiskDefFree (domain_conf.c:2261)
==2118357==    by 0x49BFE64: virDomainDiskDefFree (domain_conf.c:2256)
==2118357==    by 0x49DAB1B: virDomainDefFree (domain_conf.c:3598)
==2118357==    by 0x49DAB1B: virDomainDefFree (domain_conf.c:3562)
==2118357==    by 0x49DB40C: virDomainObjDispose (domain_conf.c:3744)
==2118357==    by 0x496DAED: vir_object_finalize (virobject.c:326)
==2118357==    by 0x4E8BA6F: g_object_unref (in /usr/lib64/libgobject-2.0.so.0.6600.8)
==2118357==    by 0x496E147: virObjectUnref (virobject.c:380)
==2118357==    by 0x496E147: virObjectUnref (virobject.c:373)
==2118357==  Address 0x1b92a140 is 32 bytes inside a block of size 520 free'd
==2118357==    at 0x483A9F5: free (vg_replace_malloc.c:538)
==2118357==    by 0x4D9170C: g_free (in /usr/lib64/libglib-2.0.so.0.6600.8)
==2118357==    by 0x4DAB9FF: g_slice_free1 (in /usr/lib64/libglib-2.0.so.0.6600.8)
==2118357==    by 0x4EA0203: g_type_free_instance (in /usr/lib64/libgobject-2.0.so.0.6600.8)
==2118357==    by 0x496E147: virObjectUnref (virobject.c:380)
==2118357==    by 0x496E147: virObjectUnref (virobject.c:373)
==2118357==    by 0x16033BA5: qemuBlockJobProcessEventCompletedCommit (qemu_blockjob.c:1229)
==2118357==    by 0x16033BA5: qemuBlockJobEventProcessConcludedTransition (qemu_blockjob.c:1546)
==2118357==    by 0x16033BA5: qemuBlockJobEventProcessConcluded (qemu_blockjob.c:1647)
==2118357==    by 0x16033BA5: qemuBlockJobEventProcess (qemu_blockjob.c:1686)
==2118357==    by 0x16033BA5: qemuBlockJobUpdate (qemu_blockjob.c:1732)
==2118357==    by 0x160A8178: processJobStatusChangeEvent (qemu_driver.c:4195)
==2118357==    by 0x160A8178: qemuProcessEventHandler (qemu_driver.c:4355)
==2118357==    by 0x498A541: virThreadPoolWorker (virthreadpool.c:163)
==2118357==    by 0x4989BC4: virThreadHelper (virthread.c:233)
==2118357==    by 0x59C23F8: start_thread (in /usr/lib64/libpthread-2.32.so)
==2118357==    by 0x51E0B52: clone (in /usr/lib64/libc-2.32.so)
==2118357==  Block was alloc'd at
==2118357==    at 0x4839809: malloc (vg_replace_malloc.c:307)
==2118357==    by 0x4D94BB8: g_malloc (in /usr/lib64/libglib-2.0.so.0.6600.8)
==2118357==    by 0x4DAC481: g_slice_alloc (in /usr/lib64/libglib-2.0.so.0.6600.8)
==2118357==    by 0x4DACB0D: g_slice_alloc0 (in /usr/lib64/libglib-2.0.so.0.6600.8)
==2118357==    by 0x4EA4C19: g_type_create_instance (in /usr/lib64/libgobject-2.0.so.0.6600.8)
==2118357==    by 0x4E8D8C4: ??? (in /usr/lib64/libgobject-2.0.so.0.6600.8)
==2118357==    by 0x4E8E69C: g_object_new_with_properties (in /usr/lib64/libgobject-2.0.so.0.6600.8)
==2118357==    by 0x4E8F310: g_object_new (in /usr/lib64/libgobject-2.0.so.0.6600.8)
==2118357==    by 0x496E093: virObjectNew (virobject.c:255)
==2118357==    by 0x4A33B12: virStorageSourceNew (storage_source_conf.c:1191)
==2118357==    by 0x4A33CF2: virStorageSourceCopy (storage_source_conf.c:801)
==2118357==    by 0x1610AAB4: qemuSnapshotDiskPrepareOne (qemu_snapshot.c:1068)

(note that there's wrong function printed in the trace: qemuBlockJobProcessEventCompletedCommit, whereas the bug is in qemuBlockJobProcessEventCompletedPull)

Comment 1 Peter Krempa 2021-04-13 09:00:45 UTC

Fixed upstream:

commit b4d020790642fa4d7b8a6783b81d5d9d73cbe3d9
Author: Peter Krempa <pkrempa>
Date:   Mon Apr 12 17:42:23 2021 +0200

    qemuBlockJobProcessEventCompletedPull: Add backingStore terminators if base is NULL
    
    When doing a blockpull with NULL base the full contents of the disk are
    pulled into the topmost image which then becomes fully self-contained.
    
    qemuBlockJobProcessEventCompletedPull doesn't install the backing chain
    terminators though, although it's guaranteed that there will be no
    backing chain behind disk->src.
    
    Add the terminators for completness and for disabling backing chain
    detection on further boots.
    
    Signed-off-by: Peter Krempa <pkrempa>
    Reviewed-by: Pavel Hrdina <phrdina>

commit 46e748aa02cbd5923fa4b500352f528de35dc665
Author: Peter Krempa <pkrempa>
Date:   Mon Apr 12 17:24:22 2021 +0200

    qemuBlockJobProcessEventCompletedPull: Avoid dangling pointer after blockpull
    
    When doing a full block pull job (base == NULL) and the config XML
    contains a compatible disk, the completer function would leave a
    dangling pointer in 'cfgdisk->src->backingStore' as cfgdisk->src would
    be set to the value of 'cfgbase' which was always set to
    'cfgdisk->src->backingStore'.
    
    This is wrong though since for the live definition XML we set the
    respective counterpart to 'job->data.pull.base' which is NULL in the
    above scenario.
    
    This leads to a invalid pointer read when saving the config XML and may
    end up in a crash.
    
    Resolve it by setting 'cfgbase' only when 'job->data.pull.base' is
    non-NULL.
    
    Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1946918
    Signed-off-by: Peter Krempa <pkrempa>
    Reviewed-by: Pavel Hrdina <phrdina>

Comment 5 Meina Li 2021-05-19 03:10:07 UTC

Verified Version:
libvirt-7.3.0-1.module+el8.5.0+11004+f4810536.x86_64
qemu-kvm-6.0.0-16.module+el8.5.0+10848+2dccc46d.x86_64


Verified Steps:
1. Prepare a running guest.
# virsh domstate lmn 
running
# virsh dumpxml lmn --inactive | grep /disk -B6
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/var/lib/avocado/data/avocado-vt/images/jeos-27-x86_64.qcow2'/>
      <backingStore/>
      <target dev='vda' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
    </disk>

2. Create snapshots for the guest.
# for i in {1..4}; do virsh snapshot-create-as lmn s$i --disk-only; done
Domain snapshot s1 created
Domain snapshot s2 created
Domain snapshot s3 created
Domain snapshot s4 created
3. Do blockpull for the virtual disk.
# virsh blockpull lmn vda --wait --verbose --async
Block Pull: [100 %]
Pull complete
4. Destroy the guest and undefine it with --snapshots-metadata.
# virsh destroy lmn
Domain 'lmn' destroyed
# virsh undefine lmn --snapshots-metadata
Domain 'lmn' has been undefined

Comment 7 errata-xmlrpc 2021-11-16 07:52:31 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (virt:av bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:4684

Note You need to log in before you can comment on or make changes to this bug.